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Ease pressure on PIs 


Britain’s Research Excellence Framework assessment needs to boost support for over-burdened 
principal investigators — a point missed in a review of the process. 


recommendations on how to improve the United Kingdom’s 

Research Excellence Framework (REF). Rightly, they support 
the principle and much of the practice of this periodic assessment of 
the research strengths of UK universities — which drives the allocation 
of universities’ core funding. Although many academics resent having 
to submit their achievements every few years, the review concludes 
that the REF’s substantial costs are greatly outweighed by its benefits. 

Importantly, the recommendations seek to mitigate distortions 
introduced when institutions attempt to game the REF, for example 
by claiming credit for papers written by staff before they joined (see 
Nature http://doi.org/bm9x; 2016). They also support efficient docu- 
mentation of the societal impacts of academic research. 

But the report has not gone far enough on behalf of the linchpins of 
research: principal investigators. It has not recognized the threats to 
good science that arise from the overwhelming pressures now being 
placed on them. 

The Stern review has made one very positive recommendation in 
that direction: it suggests that heads of academic departments and 
institutes could take the heat off their staff by no longer requiring 
each investigator to submit the same number of research outputs to 
the REE For the next assessment, in a few years’ time, departments 
would instead be required to submit only a certain number of outputs 
overall: some principal investigators might report more than average; 
some might even report none. A good institute head would balance 
their virtues on the basis of the long-term character of their research. 

The motivation for this is clear when the review says that a priority 


L= week, a committee led by economist Nicholas Stern published 


is to find “ways to ensure that the REF can encourage researchers to 
explore big or fundamental problems, in ways that may not deliver a 
steady stream of papers or a quick monograph; to deliver academically 
excellent synthesis of evidence and meta-analysis to support policy 
making; and to develop game changing ideas that, for example, can 
lead to the development of new disciplines, or that have significant 
impact outside their discipline”. 

But the pressures on principal investigators arise not only from 
research accountability. Alas, researchers are merely human. They 
have finite bandwidths, and it is difficult to balance their duties when 
journals, funders and universities are rightly increasing their demands 
for better data management and sharing, better reproducibility, bet- 
ter mentoring of postdocs and graduate students, better teaching and 
broader stakeholder engagement. No wonder many of the best prin- 
cipal investigators are wilting under the stress, and even leaving aca- 
demia. This is exacerbated when funding allocation is ultracompetitive. 

The REF should attend to this when it assesses a department's research 
environment. Stern’s recommendations would empower universities to 
strengthen research cultures. But institutions and funders should act 
directly to mitigate pressures on principal investigators, for example by 
supporting staff for data-management planning and sharing, crafting 
grant applications and administrative tasks. This would combat a creed 
that research money is best spent only on yet more postdocs. 

The REF should help institutions to counter such instincts without 
compromising the creative autonomy of the principal investigators on 
whom they depend. The Stern review could and should have pushed 
harder in that direction. m 


@ 
Cyborg Olympics 
The Olympic Games celebrate physical skill; the 
Cybathlon honours innovation in prosthetics. 


hat defines human physical excellence? Is it the pain, sweat 
W:=« grit of elite athletes using every slight genetic advan- 

tage to perfect their bodies for competition? Or is human 
ingenuity also to be celebrated, particularly when science can allow 
disabled athletes — who are just as gritty and driven as their able- 
bodied counterparts — to compete on a level playing field? 

The Olympics (see page 18) and Paralympics already struggle with 
this question. Now, into the debate comes a ‘cyborg Olympics’ that 
melds human and machine to create a new kind of athlete. In Octo- 
ber, nearly 80 teams from 25 countries around the world will gather 
in Zurich, Switzerland, to compete in the Cybathlon (see page 20). 


Each team is made up of engineers and scientists who have created 
a powered prosthesis for a disabled ‘pilot to use in one of six competi- 
tions. Electrical stimulation of paralysed leg muscles allows pilots with 
spinal-cord injuries to ride bikes. Other races use robotic prosthetic 
arms to complete tasks such as setting a dinner table, or track brain 
activity to race avatars on a screen. 

What sets the Cybathlon apart from other sporting events is how it 
celebrates human technological achievement rather than just physi- 
cal excellence. The Olympic Games strictly limit the technology that 
athletes can use, for instance requiring cyclists to ride bikes that adhere 
to tight standards. The Cybathlon, by contrast, limits the humans, 
requiring that its cyclists must not be able to move their legs without 
the help of artificial stimulation. 

The goals of the two events are very different, of course. The 
Olympics is a competition for fans’ entertainment and athletes’ 
glory, whereas the Cybathlon is intended to kick-start innovation 
in prosthetics for real-world uses. And as technology and opportu- 
nities develop, they should also spark broader debate about human 
enhancement. = 
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MICHAEL TEMCHINE 


0 R L D V EW A personal take on events 


1811, every nation gets the government it deserves, what might the 
United States have done to deserve Donald Trump? 

A well-functioning democracy should undercut the appeal of 
blustering, xenophobic demagogues by ensuring that most citizens 
have a stake in government and hope for the future. And although no 
single cause or problem can explain Trump’s appeal to a large part of 
the American electorate, his nomination as the Republican presiden- 
tial candidate should be cause for serious reflection about what is going 
wrong in America. For many Americans, one thing that has 
gone wrong is that the promise of scientific and technological 
progress has not been fulfilled. 

This promise is at the heart of the American identity: it is anchored 
by founding fathers Benjamin Franklin and Thomas Jefferson, 
scientists and inventors both, extolled by Alexis 
de Tocqueville in his 1835 masterwork Demo- 
cracy in America, embodied in the inventions of 
Thomas Edison, and codified in its modern form 
in Science, The Endless Frontier, Vannevar Bush's 
famous 1945 science-policy report to President 
Franklin D. Roosevelt, which laid out the still- 
powerful argument for government sponsorship 
of basic science. 

Indeed, Bush’s linking of the frontier metaphor 
to the promise of scientific progress was a distinc- 
tively American flourish. And his formula was 
simple: three factors — “the free play of initiative 
of a vigorous people under democracy, the heri- 
tage of great natural wealth, and the advance of 
science and its application” — would deliver to all 
Americans full employment and rising standards of living, improved 
health, and military security. Government investment in science, 
especially research carried out at the nation’s elite universities, would 
prime the pump of continual progress. 

Not everyone, however, was buying Bush’s story. Starting in the 
early 1940s, Senator Harley Kilgore, a Democrat from West Virginia, 
championed a different national approach to science policy, one in 
which government investment would focus research and develop- 
ment directly on social goals and economic growth. A six-year politi- 
cal battle between Kilgore and Bush followed, to control not just US 
science policy itself, but, equally importantly, the rhetoric of science 
and progress. Bush, who had much of the leadership of academic 
and industrial science on his side, and who saw Kilgore as a threat to 
the independence of both elite academic science and the economic 
marketplace, became the decisive winner on both fronts: the 1950 
bill creating the National Science Foundation gave scientists primary 
responsibility for determining the agency’s research agenda. 

Over the subsequent 65 years, scientists and science advocates 
have not shirked from parroting Bush’s Endless Frontier vision of 


I: as the French counter-revolutionary Joseph de Maistre wrote in 


THE PROMISE 
OF SCIENTIFIC AND 
TECHNOLOGICAL 


PROGRESS 


HAS NOT BEEN 


FULFILLED. 


Donald Trump’s appeal 
should be a call to arms 


Trump’s nomination as Republican presidential candidate is a reminder that 
scientific progress has not benefited all Americans, says Daniel Sarewitz. 


scientific knowledge, flowing from “the free play of free intellects’, 
as an unalloyed good from which all citizens would benefit through 
the ever-expanding economic opportunities created by science-based 
innovation. It has been an appealingly non-ideological view of pro- 
gress, adopted across the political spectrum. As Nobel-prizewinning 
physicist Leon Lederman put it in 1992: “What's good for American 
science ... is good for America.” 

Maybe not. Although Trump supporters are by no means a 
homogeneous lot, a clever analysis in The New York Times in 
March showed that they can most reliably be characterized by two 
attributes. First, they identify their ancestral heritage as Ameri- 
can, rather than any particular ethnic or religious stock. And sec- 
ond, they live in regions of the country that have not only failed to 
benefit economically from innovation, but have been harmed by it. 

Mainstream media analysis of the Trump 
phenomenon almost never links it to the science 
and technology policies pursued by the nation 
since the Second World War. Yet technological 
revolutions arising from these policies have con- 
tributed to more than 40 years of wealth inequal- 
ity, disappearing middle-class jobs and eviscerated 
manufacturing communities in the places where 
support for Trump is strongest. Indeed, economic 
theory throws aside these millions of people as 
the inevitable losers in the ‘creative destruction 
that science catalyses, as if ruined cities and liveli- 
hoods are just side effects of the strong medicine 
of science-based innovation. These people are the 
cost of the prevailing myth of progress, and, given 
their core identity as ‘Americans, it is no wonder 
they are susceptible to Trump’s jingoistic populism. 

No one remembers Harley Kilgore any more, and it’s impossible to 
know whether his socially oriented vision of science policy might have 
contributed to a more equitable linking between scientific advance and 
economic benefit. But itis more than simply ironic that Kilgore’s home 
state of West Virginia — whose per capita income ranks 49th out of 
the 50 states — is now Trump's strongest supporter. 

Having claimed for more than a half a century that science-based 
innovation would be good for everyone, science advocates and 
scientists who have benefited so greatly from this line of argument 
can hardly now say, “Oh, but it’s not our fault, these are problems of 
trade and labour and economic policy”. Trump’s ascendance should 
rekindle the Bush-Kilgore debates, and policymakers should seriously 
consider what a system of socially responsible and responsive science 
would look like. The current system has failed the test. m 


Daniel Sarewitz is co-director of the Consortium for Science, Policy and 
Outcomes at Arizona State University, and is based in Washington DC. 
e-mail: daniel.sarewitz@asu.edu 
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RESEARCH HIGHLIGHTS 


Single ions make 
sharper images 


A microscope creates images 
with nanometre resolution by 
exposing samples to single ions. 

In electron and ion 
microscopy, increasing a 
sample's exposure time can 
improve the signal-to-noise 
ratio and result in clearer 
images, but this can damage 
or contaminate the sample. 

To avoid this, Georg Jacob 

at the University of Mainz in 
Germany and his colleagues 
used an electric-field ‘trap’ to 
release calcium ions one by 
one. 

Each ion, either transmitted 
or blocked by the sample, 
corresponds to a pixel. By 
controlling the release of the 
ions, the team calculated when 
those coming from the source 
and through the sample should 
arrive at the detector. This 
allowed the team to turn the 
detector on only during those 
times, reducing the number of 
detected ‘noise’ ions. 

The microscope showed a 
fivefold increase in the signal- 
to-noise ratio, and could 
potentially cut noise signals 
by a factor of one million, 
compared with current 
methods. It also pinpointed 
the position of a 1-micrometre 
hole ina diamond sample with 
a precision of 2.7 nanometres. 
Phys. Rev. Lett. 117,043001 
(2016) 


Ketones alter 
metabolism 


Athletes’ physical endurance 
can be enhanced by drinking 
ketones — biochemical fuel 
normally produced during 
starvation. 

Fasting or prolonged 
exercise drives liver cells 
to make ketone bodies as 


Selections from the 
scientific literature 


CONSERVATION 


Farmed salmon go wild 


Norway's wild salmon owe part of their genetic 
make-up to escapees from salmon farms, 
which could compromise the fitness of the wild 


population. 


Wild Atlantic salmon (Salmo salar; pictured) 
are more genetically diverse and generally 
better adapted to the environment than are 
their farmed counterparts. Sten Karlsson 
and his colleagues at the Norwegian Institute 
for Nature Research in Trondheim looked at 
genetic markers in 21,562 wild salmon from 
147 locations around Norway. They found 


a fast-acting fuel to help 
tissues cope with the energy 
deficit. To test the effect of 
these molecules on exercise 
metabolism, Pete Cox at the 
University of Oxford, UK, and 
his colleagues gave endurance 
athletes a drink containing 
ketones. They found that after 
a prolonged period of cycling, 
the metabolism of those who 
consumed ketones had shifted 
so that they conserved glucose 
and burned more fat than 
those who did not receive 
ketones. In a 30-minute time 
trial done after one hour of 
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significant genetic material from farmed 
salmon in wild fish from about half of these 
locations, with the average wild population 


showing 6% farmed genetic heritage. In some 


cycling, athletes who had 
consumed ketones and 
carbohydrates cycled more 
than 400 metres further, on 
average, than those who had 
eaten only carbohydrates. 
Cell Metab. http://doi.org/bm8z 
(2016) 


Humpbacks to 
the rescue 


Humpback whales come to 
the assistance of other species 
by harassing the killer whales 


locations, this rose to 42%. 

Wild populations in areas with many salmon 
farms contained higher levels of farmed salmon 
DNA than did those in regions with less 
farming. Managers of both wild and farmed 
animals should work to minimize mating 
between the two populations, the authors say. 
ICES J. Mar. Sci. http://doi.org/bm6j (2016) 


that are attacking those 
animals. 

Robert Pitman at the 
Southwest Fisheries Science 
Center in La Jolla, California, 
and his colleagues reviewed 
reports of 115 interactions 
between humpback whales 
(Megaptera novaeangliae) 
and killer whales (Orcinus 
orca). In at least 31 cases, 
humpbacks approached and 
‘mobbed’ killers as the killers 
attacked or fed — mostly 
on sea lions, seals and other 
whale species. 

The authors suggest that the 
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humpbacks responded to the 
calls of the attacking killers. 
They add that humpbacks 

do not seem to benefit from 
helping other animals, so this 
may be a case of interspecies 
altruism. 

Marine Mammal Sci. http://doi. 
org/bm58 (2016) 


Fruit flies care 
about texture 


Fruit flies prefer food with 
certain textures, thanks to 
specific neurons in the brain 
that connect to taste sensors in 
the tongue. 

Yali Zhang and Craig 
Montell at the University of 
California, Santa Barbara, and 
their colleagues gave fruit flies 
(Drosophila melanogaster) 
liquid solutions of varying 
viscosity, and solid foods of 
varying hardness. The flies 
preferred food that was less 
viscous and of an intermediate 
hardness. The authors 
identified a type of neuron, 
called md-L, that responded 
to mechanical stimulation 
of the sensory hairs on the 
tongue. When the team 
stimulated these neurons, the 
flies’ feeding behaviour altered 
according to the strength of 
the stimulus. 

The researchers pinpointed 
a protein, TMC, that is found 
in the neurons’ membranes 
and is required for texture 
sensation. Mammals and 
other animals also express 
TMC proteins, suggesting that 
they may also have neuronal 
sensors for food texture. 
Neuron http://doi.org/bm8x 
(2016) 


IMMUNOLOGY 


Gut microbes 
boost antibodies 


Intestinal bacteria release 
metabolic by-products that 
support antibody-producing 
immune cells. 

Gut microbes produce 
short-chain fatty acids as they 
digest dietary fibre. Chang 
Kim and his colleagues at 
Purdue University in West 


Lafayette, Indiana, treated 
cultured B cells with the fatty 
acids and found that this 
enhanced the expression 

of genes that help the cells 

to develop into antibody- 
producing factories known 
as plasma B cells. The 
treatment also increased the 
cells’ metabolism, helping 

to support the energy- 
consuming process of making 
antibodies. 

Mice fed a low-fibre diet 
were more susceptible than 
other animals to infection 
by the pathogen Citrobacter 
rodentium and had weaker 
immune responses. Treating 
the mice with short-chain 
fatty acids or dietary fibre 
increased antibody production 
and reversed this immune 
deficiency. 

Cell Host Microbe http://doi.org/ 
bm82 (2016) 


EVOLUTION 


Long trips foster 
tool use 


Chimpanzees that travel 
farther in search of food 
are more likely to use tools 
than are their less-travelled 
counterparts. 

Thibaud Gruber and his 
colleagues at the University 
of Neuchatel in Switzerland 
studied wild chimpanzees 
(Pan troglodytes schweinfurthii; 
pictured) in Uganda, using six 
years of experimental data and 
seven years of observations. 
The researchers drilled small 
holes in logs and filled them 
with honey, which the chimps 


RESEARCH HIGHLIGHTS BiiiSaiiaae 


could access only using a 
tool such as a leaf sponge 
(pictured). Chimps were 
more likely to use tools to get 
honey when they had recently 
foraged over longer distances, 
compared to those that 
travelled less. Published data 
on wild chimps also revealed 
that communities that travel 
more use a larger repertoire of 
feeding tools. 

Moving long distances 
may have helped to drive the 
development of early human 
technology, the authors say. 
eLife http://doi.org/bm6c (2016) 


ENVIRONMENTAL SCIENCE 


Humans havea 
hand in wildfires 


People strongly influence 
the likelihood of fires in 
forests, grasslands and other 
ecosystems across the United 
States and Canada, mostly by 
lowering fire risks. 

People can ignite fires, 
but can also suppress them 
by altering properties of 
the land, for example by 
removing natural vegetation. 
To better understand the 
effects humans have, a team 
led by Marc-André Parisien 
of Natural Resources Canada 
in Edmonton used statistical 
models to analyse human 
and natural factors linked to 
fire probability across both 
countries between 1984 
and 2014. They found an 
association between human 
activities and fire — the 
stronger the human influence, 
the lower the probability of fire. 


Wildfires (pictured) are 
rarely purely natural, and 
fire managers should take 
this into account when 
considering how fire risks 
may shift in a warming world, 
the authors say. 
Environ. Res. Lett. 11,075005 
(2016) 


ECOLOGY 


Insecticides hurt 
male bees too 


A class of pesticides that 

has been linked to declining 
bee populations harms the 
reproductive capacity of male 
honeybees, not just that of 
queens as other research has 
shown. 

Neonicotinoid pesticides 
are currently banned by the 
European Union because 
of their effects on bees. Lars 
Straub at the University of 
Berne and his team exposed 
colonies of honeybees (Apis 
mellifera) in the field to 
two neonicotinoids that 
are commonly found in 
agricultural fields. The team 
found that in males, the 
chemicals reduced living 
sperm count by 39% and 
decreased the insects lifespan. 

This could negatively affect 
honeybee colony fitness and 
queen survival and health, the 
authors say. 

Proc. R. Soc. B 283, 20160506 
(2016) 
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SEVEN DAYS escnnss 


Hinkley deal delay 


Ina surprise move, the UK 
government delayed its final 
approval of a new nuclear 
power plant at Hinkley Point 
on 28 July, just hours after 
French energy company EDF, 
which is financing most of 

the project's construction, 

gave it the go-ahead. Hinkley 
Point C would be the first new 
UK nuclear power station this 
century, and has an £18-billion 
(US$24-billion) price tag. 
China has signed up to provide 
one-third of the cost. The 
project was championed by the 
previous UK government, but 
the current government said 
that it needed time to review 
the deal. 


Farewell Philae 

On 27 July, the European Space 
Agency (ESA) switched off 
radio communications with 
Philae, the space probe that 
made history by landing on 

a comet in November 2014. 
Philae had a bright but unlucky 
career. After landing on the 
comet 67P/Churyumov- 
Gerasimenko, it failed to 

grip the comet’s surface and 


661 believe 
in science. 

I believe 
that climate 
change 

is real. 99 


Hillary Clinton gives time 

to science in her speech 
accepting the Democratic 
nomination for US president 
at the party’s convention in 
Philadelphia, Pennsylvania, 
on 28 July. 


EU funds research on migrant crisis 


The European Commission is making 

€11 million (US$12.3 million) available for 
research that addresses challenges related to 
migration. Some 1.25 million refugees have 
entered Europe since the start of 2015, but 
management policies across the continent 
are weak and poorly coordinated. As part of 


bounced into a shady spot 
where it was unable to charge 
its solar panels. It performed 
just 64 hours of experiments 
before its batteries died. The 
probe signalled again briefly 
in June and July 2015, but by 
February, ESA said that there 
was “close to zero” chance of 
hearing from it again. Philae’s 
parent satellite, Rosetta, is 
preparing for its own demise. 
It will crash land on 67P on 
30 September, and scientists 
hope that it will collect a rich 
trove of cometary data on its 
approach. 


Zika in Florida 


Fourteen cases of Zika virus 
in Miami, Florida, are likely to 
have arisen from local Aedes 
aegypti mosquitoes, state 
authorities said on 1 August. 
Although this would be the 
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first time that people in the 
continental United States 

have become infected at home 
and not abroad, US health 
authorities have long predicted 
that such small local clusters 
would be seen in some US 
southern states — but not 
elsewhere in the country. On 
the basis of past experiences 
with dengue and chikungunya, 
viruses spread by the same 
mosquito, US authorities 

do not expect to see any 
widespread transmission, in 
sharp contrast to the situation 
in most of the 50 countries 
and territories that since 2015 
have reported their first ever 
outbreaks of Zika. 


Audit revisions 

A government-commissioned 
report on the Research 
Excellence Framework (REF), 


its Horizon 2020 framework programme, the 
commission will next year announce five calls 
for proposals related to different policy areas, 
including the integration of migrants into the 
workforce and society. Carlos Moedas, European 
Commissioner for Research, Science and 
Innovation, announced the measures last week. 


the United Kingdom's periodic 
audit of the quality of its 
research, has called for changes 
to the system in a bid to cut 
costs and prevent ‘gaming: The 
REF, which occurs every 5-7 
years, is used to allocate about 
£2 billion (US$2.6 billion) 

in annual research funding. 
Among its suggestions, the 
report proposed that the audit 
should not give universities 
credit for papers written by 
staff before they joined, and 
that institutions should submit 
all their researchers to the 
audit process, rather than just a 
selection. The government will 
consider the recommendations 
in the report, which was led 

by economist Nicholas Stern 
and released on 28 July, before 
issuing a formal response. 

See page 5 and go.nature. 
com/2ardiyc for more. 


YANNIS BEHRAKIS/REUTERS 


| _ERESEARCH 
New whale species 


3 Scientists have identified a 
new species of whale, after a 
DNA analysis of 178 beaked 
whales revealed a genetically 
distinct subset. The species, in 
the genus Berardius, is found 
in the Okhotsk and Bering 
seas in the northern Pacific 
and was previously considered 
to be a dwarf form of Baird’s 
beaked whale (B. bairdii). 
Japanese whalers had long 
acknowledged two distinct 
variants of the whale: the 
common ‘slate-grey’ variety, 
and the newly recognized 
‘black form, which they called 
karasu, Japanese for raven. 
The discovery was published 
on 26 July (P. A. Morin et al. 
Mar. Mamm. Sci. http://doi. 
org/bm7b; 2016). 


PEOPLE 
NIMH chief 


Psychiatrist and 
neuroscientist Joshua Gordon 
will be the next head of the 
US National Institute of 
Mental Health (NIMH), it 
was announced on 28 July. 
Gordon (pictured) will 

join the institute from 

the Columbia University 
Medical Center in New York 
City, where his research has 
focused on schizophrenia and 
anxiety, and on how genetic 
mutations lead to particular 
behaviours. Gordon is 


RTESY OF NIH 


TREND WATCH 


expected to take up the post 
at the NIMH, which has 

an annual budget of about 
US$1.5 billion, in September. 


GSK backs Britain 
Drug giant GlaxoSmithKline 
(GSK) said on 27 July that 

it will invest £275 million 
(US$363 million) into its UK 
operations, allaying fears that 
big pharmaceutical companies 
would begin to move out of 

the United Kingdom after the 
country’s vote last month to 
leave the European Union. GSK 
will put the money into three 
manufacturing sites, to support 
production of respiratory 

and large-molecule biological 
medicines. 


Indian space case 
The Indian Space Research 
Organisation (ISRO) received 
a major blow on 25 July, 

when the Permanent Court 
of Arbitration in The Hague, 
the Netherlands, ruled against 


it in a contract dispute. In 
2011, ISRO’s commercial arm, 
Antrix Corporation, scrapped 
a 2005 deal with satellite 
company Devas Multimedia 
in Bangalore for the long-term 
lease of two satellites, citing 
allegations of irregularities in 
the deal, including security 
concerns. Devas took the 
matter to international 
arbitration courts. Industry 
observers say that the Indian 
government may have to pay 
about US$1 billion in damages 
to Devas. 


Bioelectronics firm 


London-based drug giant 
GlaxoSmithKline announced 
on 1 August that it is teaming 
up with Verily, Google's life- 
sciences spin-offin South 

San Francisco, California, 

to develop electronics- 

based therapies. The two 
companies will contribute up to 
£540 million (US$713 million) 
to the newly established 
Galvani Bioelectronics, which 
will be based in the United 
Kingdom, over 7 years. 
Galvani will develop miniature 
implants that can alter the 
body’s electric nerve signals, 
with the aim of treating 
inflammatory, metabolic and 
hormone conditions. 


Theranos device 
Elizabeth Holmes, chief 
executive of the beleaguered 
biotechnology firm Theranos, 
unveiled a new blood-testing 


US PUBLIC WARY OF HUMAN ENHANCEMENT 


SOURCE: PEW RESEARCH CENTER; GO.NATURE.COM/2AJICGX 


Most people in the United States are concerned about the prospect 
of using cutting-edge science to enhance human abilities. 


More than 60% of people ina 

US survey are concerned about 
scientific advances being used for 
human ‘enhancement, revealed 
the Pew Research Center in 
Washington DC on 26 July. The 
poll asked 4,726 people how 
they felt about three potential 
technologies: gene editing to 
reduce disease risk in babies; 
brain implants to enhance brain 
processes; and transfusions 

of synthetic blood to improve 
strength. In each case, fewer than 
half expressed enthusiasm. See 
go.nature.com/2ag0jjd for more. 


How (a) worried and (b) enthusiastic 
do you feel about each of these ideas? 
Enthusiastic: Hi Very Mi Somewhat Worried: Hi Very Bi Somewhat 


68 


Gene editing to reduce 
disease risk in babies 


Brain implant to improve 
cognitive abilities 


Synthetic blood transfusion 
to improve performance 36 


Percentage of respondents 


Based on survey answers from 4,726 US adults in March 2016. 


SEVEN DAYS | THIS WEEK | 


11-12 AUGUST 
Chemistry graduates 
gather in Oxford, UK, 
for the Oxford Synthesis 
Summer Conference. 
go.nature.com/2aunytk 


12 AUGUST 

The annual Perseids 
meteor shower reaches 
its peak. 


machine on 1 August at a 
meeting of the American 
Association for Clinical 
Chemistry in Philadelphia, 
Pennsylvania. Theranos rose to 
prominence with promises ofa 
technology that could perform 
a wide range of diagnostic tests 
using a few drops of blood. 

But its claims faced scepticism 
and government scrutiny, and 
in July, US regulators banned 
Holmes from running a lab 

for two years. Theranos, of 
Palo Alto, California, says 

that its new miniLab machine 
can perform a variety of tests 
from a finger-prick of blood, 
but the device has not been 
independently verified. 


POLICY 


Inclusive astronomy 
On 28 July, the American 
Astronomical Society 
endorsed a statement 
intended to improve the 
experience in astronomy of 
under-represented groups 
including women, ethnic and 
racial minorities, disabled 
people and some sexual and 
gender identities. Suggested 
changes include eliminating 
discriminatory hiring practices 
and ensuring that astronomy 
institutions, facilities and data 
are accessible to everyone. The 
recommendations, spurred 

in part by discussions about 
inequality in science, came 
from a meeting on inclusive 
astronomy held in Nashville, 
Tennessee, in June 2015. 


> NATURE.COM 
For daily news updates see: 
WwW.nature.com/news 
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roubled Japanese space 
agency seeks fresh start 


Push to resurrect instrument lost during satellite failure highlights JAXA’s resilience. 


BY ALEXANDRA WITZE 


r | Vhe Japan Aerospace Exploration Agency 
(JAXA) is on a quest for redemption. 
In March, a software error caused the 
agency’s Hitomi X-ray astronomy satellite to 
break up in space, cutting short a planned 
three-year mission after only one month. 
Now JAXA is considering whether to 
rebuild and relaunch a copy of the space- 
craft’s key instrument — a US-built X-ray 
spectrometer — with help from NASA. On 


5 August, representatives of the two space 
agencies will meet to discuss the possibility 
of resurrecting the instrument that was the 
heart of Hitomi’s science. But whether JAXA 
can regain the confidence of the Japanese 
nation, and of its international partners, 
remains to be seen. 

Space experts note that JAXA has pulled 
off stunning recoveries before. It coaxed its 
crippled Hayabusa spacecraft to bring back 
dust from an asteroid, and nudged its Akat- 
suki probe into orbit around Venus 5 years 


after an engine failure seemed to render the 
spacecraft useless. 

“It’s important to note how resourceful 
JAXA has been at recovering from failures that 
typically would be catastrophic,” says Ralph 
Lorenz, a planetary scientist at the Johns Hop- 
kins University Applied Physics Laboratory in 
Laurel, Maryland, and co-author of the book 
Space System Failures (Praxis, 2005). 

Hitomi broke apart because an erroneous 
software command prompted the space- 
craft to spin faster and faster, until its 
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> solar panels flew off into space. A JAXA 
investigation blamed faulty project-manage- 
ment techniques for not catching the error. 

The failure has reverberated at every level 
of JAXA’s Institute of Space and Astronautical 
Science (ISAS) in Sagamihara, which man- 
aged Hitomi. JAXA president Naoki Okumura 
was one of three leading officials who took a 
10% pay cut for four months “to express our 
regret and caution ourselves’, he said in a June 
press conference. He has also ordered a sys- 
tems review of the institute’s next big project: 
a mission to study Earth's radiation belts that is 
slated to launch in the coming months. 

Before Hitomi, JAXA’s lowest point was 
perhaps the loss of its Nozomi mission to Mars, 
which sailed past the red planet in 2003 with- 
out entering orbit as it was supposed to. The 
same year, a new JAXA rocket design failed 
during a test launch, prompting a review ofall 
agency projects. 


TRY, TRY AGAIN 
Some have questioned whether JAXA is trying 
to do too much with too little. It often assigns 
one person to cover a number of tasks that 
NASA would spread among multiple project 
engineers, says Lorenz, who collaborates on 
the Akatsuki Venus probe. 

Okumura has acknowledged as much, saying 
that ISAS will generally develop a mission 
using a small in-house team, along with the 


spacecraft manufacturer. By contrast, Hitomi 
involved a larger number of complex systems. 
There were simply not enough safeguards built 
into the process to catch the software error. 
“The previously conventional ISAS methods 
were not necessarily suited for the production 

of modern satellites 


“It’s important and spacecraft; Oku- 
mura said. 

sotaser JAXA has released 

TAX Ahasd an extraordinary 

ube sietid level of technical 

at recovering detail about the fail- 


Hy ” 
from failures. ure. Agency offi- 


cials have said that 
because Hitomi was meant as a community 
mission to serve X-ray astronomers across the 
globe, they feel obligated to explain what hap- 
pened so that nobody makes the same mistake. 

Because of this determination and open- 
ness, “I think Hitomi’s successor is in safe 
hands with JAXA, says Elizabeth Tasker, 
an astrophysicist at Hokkaido University in 
Sapporo, Japan. 

But such projects may be a hard sell to politi- 
cians. “High-profile setbacks like Nozomi and 
Hitomi make it difficult for JAXA to justify 
big-ticket science missions in today’s political 
atmosphere,’ says Saadia Pekkanen, an expert 
in Japanese space policy at the University of 
Washington in Seattle. 

JAXA has not yet decided whether a Hitomi 


successor would fly or which instruments it 
would carry, says ISAS spokeswoman Chisato 
Ikuta. But Hitomi’s premier scientific instru- 
ment was the spectrometer provided by 
NASA; data that it collected before the space- 
craft died revealed secrets about gas flows in 
the Perseus galaxy cluster. 

The spectrometer seems to be thrice 
cursed; two earlier versions on different 
satellites were lost to a launch failure anda 
coolant leakage. Even so, a NASA advisory 
group reported on 5 July that launching a 
copy of the instrument no later than 2023 
“would fulfill the immense scientific prom- 
ise of the Hitomi” spectrometer. The cost to 
rebuild would be roughly US$70 million to 
$90 million. 

Paul Hertz, NASAs astrophysics director, will 
meet with JAXA representatives to discuss the 
options. “Certainly we would not be overseeing 
JAXA, he told a NASA advisory committee on 
20 July. “We can discuss practices that NASA 
implements to prevent us from making avoid- 
able mistakes.” 

Other international missions in the works 
from JAXA include a magnetospheric orbiter, 
which is scheduled to launch next year on 
the European Space Agency’s BepiColumbo 
mission to Mercury. 

“The Olympics of engineering is when 
things go wrong,” says Lorenz. “Maybe the best 
time to fly is right after a failure. m 


Grand proof fazes theorists 


Conference on Shinichi Mochizuki’s ‘revolutionary’ work inspires cautious optimism. 


BY DAVIDE CASTELVECCHI 


early four years after Shinichi 

| \ | Mochizuki unveiled an imposing set 

of papers that could revolutionize the 

theory of numbers, other mathematicians have 

yet to understand his work — although they 
have made modest progress. 

Some four dozen mathematicians converged 
in Japan last week for a rare opportunity to 
hear Mochizuki present his monumental proof 
of the 31-year-old abc conjecture, which sits at 
the heart of number theory. The conference 
took place on his home turf, at Kyoto Univer- 
sity’s Research Institute for Mathematical Sci- 
ences (RIMS). 

Mochizuki is “less isolated than he was before 
the process got started’, says Kiran Kedlaya, a 
number theorist at the University of California, 
San Diego. At first, Mochizuki’s proof, which 
stretches over more than 500 pages (available 
at go.nature.com/2amidei), seemed like an 
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impenetrable jungle of formulae. But experts 
have slowly discerned a strategy, and have 
zeroed in on particular passages that seem 
crucial, Kedlaya says. 

And Jeffrey Lagarias, a number theorist at 
the University of Michigan in Ann Arbor, says 
that he got far enough to see that Mochizuki’s 
work is worth the effort. “It has some revolu- 
tionary new ideas,’ he says. 

Still, Kedlaya says that the more he delves 
into the proof, the longer he thinks it will take 
to reach a consensus on whether it is cor- 
rect. He used to think that the issue would be 
resolved perhaps by 2017. “Now I’m thinking 
at least three years from now.” 

Others are even less optimistic. “The con- 
structions are generally clear, and many of 
the arguments could be followed to some 
extent, but the overarching strategy remains 
totally elusive for me,’ says mathematician 
Vesselin Dimitrov of Yale University in New 
Haven, Connecticut. “Add to this the heavy, 


unprecedentedly indigestible notation: these 
papers are unlike anything that has ever 
appeared in the mathematical literature.” 


THE ABC PROOF 

The abc conjecture relates to prime num- 
bers — whole numbers that cannot be evenly 
divided by any smaller number except 1. The 
conjecture comes in a number of different 
forms, and explains how the primes that divide 
two numbers, a and b, are related to those that 
divide their sum, c. 

If Mochizuki’s proof is correct, it would 
have repercussions across the entire field, says 
Dimitrov. “When you work in number theory, 
you cannot ignore the abc conjecture; he says. 
“This is why all number theorists eagerly 
wanted to know about Mochizuki’s approach” 
For example, Dimitrov showed in January 
how, assuming the correctness of Mochizu- 
kis proof, one might be able to derive many 
other results, including an independent proof 
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of the celebrated Fermat’s last theorem 
(V. Dimitrov. Preprint available at http:// 
arxiv.org/abs/1601.03572; 2016). 

The purported proof, which Mochizuki 
first posted on his webpage in August 2012, 
builds on more than a decade of previous 
work, in which he developed a novel and 
extremely abstract branch of mathematics 
in virtual isolation. 


MOCHIZUKI IN THE ROOM 

The Kyoto workshop followed on the heels 
of one held last December in Oxford, UK. 
Mochizuki did not attend that first meet- 
ing, although he answered questions over 
a video link. This time, having him in the 
room — and hearing him present some of 
the materials himself — was helpful, says 
Taylor Dupuy, a mathematician at the 
Hebrew University of Jerusalem. 

Around ten mathematicians are now 
putting substantial effort into digesting the 
material — up from three before the Oxford 
workshop, says Ivan Fesenko, a mathemati- 
cian at the University of Nottingham, UK, 
who co-organized both workshops. 

Mochizuki did not take part in the cus- 
tomary mingling and social activities at 
the Kyoto meeting. And although he was 
unfailingly forthcoming in answering ques- 
tions, it was unclear what he thought of the 
proceedings. “Mochizuki does not give a 
lot away,’ Kedlaya says. “He's an excellent 
poker player.” 

Mathematicians have criticized Mochi- 
zuki for his refusal to travel: after he posted 
his papers, he turned down multiple offers 
to go abroad. He spent much of his youth in 
the United States, but is now said to rarely 
leave the Kyoto area. (Mochizuki does not 
respond to interview requests, and the 
workshop’ website noted: “Activities aimed 
at interviewing or media coverage of any 
sort within the facilities of RIMS, Kyoto 
University, will not be accepted.”) 

“He is very level-headed,” says another 
workshop participant who did not want to 
be named. “The only thing that frustrates 
him is people making rash judgemental 
comments without understanding any 
details.” Still, Dupuy says, “I think he does 
take a lot of the criticism about him really 
personally. ’'m sure he’s sick of this whole 
thing, too.” = 
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MORE 
ONLINE 


David Davis leads the UK government’s Department for Exiting the European Union. 


UK scientists seek 
Brexit influence 


They hope for active role in negotiations to exit EU. 


BY ELIZABETH GIBNEY 


in years is under way. After the shock 

of the United Kingdoms vote to leave 
the European Union, anxious researchers are 
doing all they can to ensure that their interests 
are represented in Brexit negotiations. One 
big unanswered question is what role science 
will have in the new ‘Brexit ministry’ — the 
Department for Exiting the European Union 
(DEEU) — that has been expressly formed to 
take the country out of the EU. 

Worried at the prospect of losing access 
to EU funding and collaborations, scientific 
societies have fired off numerous letters ask- 
ing the government to keep their country 
in the EU’s research system, and warning of 
damage already caused by Brexit. An advocacy 
group, Scientists for EU, says it has gathered (in 
confidence) 25 cases of foreign scientists with- 
drawing job applications or being refused a UK 


B ritish science’s largest lobbying campaign 


| MORE NEWS | 
The fiery @ Philae comet lander goes quiet for 
birth of good go.nature.com/2az8klh 
Earth’s @ Major review calls time on ‘gaming’ 
largest in UK research assessment g0.nature. 
ocean com/2ardiyc 
go.nature. @ Women in physics face big 
com/2aj7q8g | hurdles —still go.nature.com/2aor9lt 


post as a result of Brexit, 7 cases of someone 
in UK science leaving the country, and 33 of 
disruption to funding for the EU’s Horizon 
2020 research-grants programme. 

The government has indicated that it is 
listening to scientists — but seems reluctant to 
say so too loudly. On 18 July, Prime Minister 
Theresa May sent a letter to Paul Nurse, the 
director of London's Francis Crick Institute, 
telling him that the government was committed 
to “ensuring a positive outcome for UK science” 
as the country exited the EU. But the letter — 
effectively May’s first statement on science — 
did not become public knowledge until science 
minister Jo Johnson referred to it in passing in a 
25 July speech at the EuroScience Open Forum 
in Manchester, prompting journalists to press 
for a copy. Venkatraman Ramakrishnan, the 
president of London’s Royal Society, said he 
welcomed the comments and was looking for- 
ward to working with May and her colleagues 
“to turn these words into action”. > 


NATURE PODCAST 


How thirst works, 
a programmable 
quantum 
computer, and 
how children learn 
nature.com/nature/ 
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> What action May will take remains 
unclear: prospects for science are inextrica- 
bly entangled with the wider Brexit issues of 
freedom of movement and UK access to the 
EU’s single market. David Davis, a Mem- 
ber of Parliament who had campaigned on 
the ‘leave side of the referendum, leads the 
DEEU. He has announced plans to conduct 
a “huge consultation” ahead of the start of 
formal EU exit negotiations, which May has 
postponed until at least 2017. 


SCIENCE IN THE BREXIT MINISTRY 

Davis’ team is talking to “the research insti- 
tutes’, he told Sky News on 17 July — but 
his department could not confirm which 
bodies this referred to. UK national aca- 
demies have written jointly to Davis and 
“look forward to working with him to 
ensure that science’s voice is heard in Brexit 
negotiations’, the Royal Society told Nature. 

Some hope that the Brexit ministry will 
contain specific advocates for research. 
“There should be some sort of champion 
for science within the department,” says 
John Beddington, a population biologist at 
the Oxford Martin School, and a former UK 
chief scientific adviser. An obvious choice is 
science minister Johnson, Beddington says, 
although the DEEU could also dedicate a 
group of civil servants to the job. Johnson 
could be a “very strong, very early voice” in 
DEEU deliberations, Sharon Witherspoon, 
policy chief at the UK Academy of Social 
Sciences, told a House of Lords inquiry on 
19 July. She added that research needed 
“urgent attention, and cannot wait to be an 
afterthought”. 

Giving more-formal responsibilities to 
Johnson, whose role in May’s government 
is split between the education and business 
departments, might be a stretch. “If anyone 
can do it, Jo can. But I’m not confident that 
the best voice for the science community 
would be to add another job on for Jo,’ says 
Nick Hillman, director of the Oxford-based 
Higher Education Policy Institute. 

A different potential conduit for 
scientific input could be the DEEU’s 
departmental board, an advisory body 
that, in other departments, often includes 
senior business figures. And another idea 
is for Davis’s department to appoint a chief 
scientific adviser (CSA), as most other UK 
ministries already have. But Beddington 
says that although the DEEU and the newly 
created Department for International Trade 
should each have a CSA, their role should 
not be to advocate for science, but to feed 
advice into the negotiations on issues such 
as environmental regulations, product 
standards and health and safety. “Whether 
to appoint a CSA is the kind of thought 
process they should be going through,” says 
Hillman. “It doesn’t mean they are there 
yet, though.” = 
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Daniel Himmelstein, pictured at his previous research post at the University of California, San Francisco. 


INTELLECTUAL PROPERTY 


Legal maze threatens 
to slow data science 


Researcher who spent months chasing permission to 
republish online data sets urges others to read up on the law. 


BY SIMON OXENHAM 


nowledge from millions of biological 
k studies encoded into one network — 
that is Daniel Himmelstein’s alluring 
description of Hetionet, a free online resource 
that melds data from 28 public sources on links 
between drugs, genes and diseases. But for a 
product built on public information, obtaining 
legal permissions has been surprisingly tough. 
When Himmelstein, a data scientist at the 
University of Pennsylvania in Philadelphia, 
contacted researchers for permission to repro- 
duce their work openly, several said they were 
surprised that he had to ask. “It never really 
crossed my mind that licensing is an issue 
here,’ says Jorg Menche, a bioinformatician at 
the Research Center for Molecular Medicine of 
the Austrian Academy of Sciences in Vienna. 
Menche rapidly gave consent — but not 
everyone was so helpful. One research group 
never replied to Himmelstein, and three 
replied without clearing up the legal confu- 
sion. Ultimately, Himmelstein published the 
final version of Hetionet in July — minus one 
data set whose licence forbids redistribution, 
but including the three that he still lacks clear 


permission to republish. The tangle shows that 
many researchers don’t understand that simply 
posting a data set publicly doesn’t mean others 
can legally republish it, says Himmelstein. 

The confusion has the power to slow down 
science, he says, because researchers will be 
discouraged from combining data sets into 
more useful resources. It will also become 
increasingly problematic as scientists pub- 
lish more information online. “Science is 
becoming more and more dependent on reus- 
ing data,’ Himmelstein says. 


DATA-SET LAWS 

Because a piece of data — a fact — cannot be 
copyrighted, many scientists think that a pub- 
licly posted data set that does not place explicit 
terms and conditions on access can simply be 
republished without legal problems. But that’s 
not necessarily correct, says Estelle Derclaye, 
a specialist in intellectual-property law at the 
University of Nottingham, UK. 

The European Union assigns specific data- 
base rights, independent of copyright, that aim 
to protect the investment made in compiling 
a database. Legally speaking, these rights pre- 
vent researchers such as Himmelstein from 
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republishing data sets created by scientists in 
EU states without their consent. 

Other countries have different layers of 
legal protection. But even in jurisdictions 
such as the United States, where no separate 
rights exist to govern databases, there is still 
room for confusion. Although facts don't 
qualify for copyright, the way they are com- 
piled arguably might — if the act of making 
that compilation requires sufficiently crea- 
tive expression. “The default legal position 
on how data may be used in any given context 
is hard to untangle,” according to a guide on 
licensing data issued by the Digital Curation 
Centre in Edinburgh, UK. 

Advocates of data-sharing accordingly rec- 
ommend that researchers who are creating 
public databases add clear licences explaining 
how they intend their data to be reused and 
redistributed, and whether they waive any 
database rights. 


LACK OF CONFIDENCE 

In Himmelstein’s case, some of the data sets 
that he wanted to use had clear licences — and 
some of these prevented unrestricted redistri- 
bution, but others did not. The most frustrat- 
ing part of his project, he says, was the feeling 
that good data were going to waste because 
their creators could not clarify whether he 
could republish them. 


Andrew Charlesworth, an intellectual- 
property expert at the University of Bristol, 
UK, says that this may be because few re- 
searchers were confident enough of the law to 
give Himmelstein clear guidance. “What you 
tend to find is that if nobody has a remit to 
answer those kinds of questions, they are not in 
a hurry to take it on,” he says. 

Even without clear permissions, Himmel- 
stein is unlikely to face legal penalties for pub- 
lishing Hetionet, says 


“These are Jonathan Band, an 
largely untested intellectual-property 
waters, and . lawyer with the law 
most academics firm Policy Band- 
aren’t in the width in Washing- 
position to risk ton DC — unless, 
setting off a that is, he mistakenly 


breached terms and 
conditions placed on 
the data sets. Academics who put their data 
sets publicly online usually intend their work 
to be available for others to republish freely; 
and no one has ever got into trouble for doing 
Himmelstein’s kind of project, Band adds. 
But Himmelstein is not convinced that he 
is legally in the clear — and feels that such 
uncertainty may deter other scientists from 
reproducing academic data. If a researcher 
launches a commercial product that is based 
on public data sets, he adds, the stakes of not 


legal battle.” 
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having clear licensing are likely to rise. “I think 
these are largely untested waters, and most 
academics aren't in the position to risk setting 
off a legal battle that will help clarify these 
issues,’ he says. m 


CORRECTIONS 
The News Feature ‘Physics on two wheels’ 
(Nature 535, 338-341; 2016) contained 
several biographical inaccuracies. Michael 
Papadopoulos moved his family to the 
United States more than a decade before 
taking a job at Oregon, not in 1967. Jim 
Papadopoulos spent a whole academic 
year at Oregon before starting at MIT. He 
did not write to bike companies asking for 
work until the 1990s. His time at the US 
Geological Survey was part of an internship, 
nota full-time job. The e-mail list he 
moderated was also founded by him, and 
is called Hardcore Bicycle Science. He has 
actually published three first-author papers, 
but just one related to bicycle science. He 
was also not given a chance to respond to a 
comment about his ability to finish things. 
The News Feature ‘The beer geeks’ 
(Nature 535, 484-486; 2016) misattributed 
the quotes in the last paragraph. They came 
from Kevin Verstrepen, not Stijn Mertens. 
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SURGE IN SCIENCE 


The proportion of research papers that are about the 
Games has risen rapidly. Over the past few decades, 
the Olympics has also expanded the number of 


SCHOLARLY 
events, drawn more participants and become vastly 


OLYMPICS Ts 


HOW THE GAMES ¥ 
HAVE SHAPED RESEARCH PAPERS PER GAMES 


Beijing 2008 inspired the most papers, 
followed by London 2012. 

Beijing had imposed special restrictions 
on air pollutants, providing a rare 
opportunity for researchers to do relatively 
controlled experiments, says David Rich, 
an environmental epidemiologist at the 
University of Rochester in New York. 

The London 2012 Olympics inspired 
topics ranging from urban development 
and sprawl to security and surveillance. 


LONDON 2012 


o 
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nm 
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Olympics articles (% of all papers) 


Whether it’s drug scandals, pollution problems or 
sheer curiosity at the incredible capabilities of the 1952 2016 
athletes, the Olympic Games have long fascinated 
researchers as well as the general public. 

In recent decades, research has increased on 
the selection of Olympic sites, environmental issues 
and the Games’ ability to encourage people to 
participate in sport, says sports-medicine specialist 
Lars Engebretsen, who heads science and research 
for the International Olympic Committee. 

The Olympics don’t typically inspire researchers 
to start new fields — instead, they tend to feed into 
ongoing studies, says Vanessa Heggie, a historian 
of science and sports medicine at the University of 
Birmingham, UK. 

As the 2016 summer Games kick off in Rio 
de Janeiro, Nature uses bibliometrics to provide 
insight into the who, where, what, how and why of 
Olympic science. 


ATLANTA 1996 
SYDNEY 2000 


ATHENS 2004 
vW 
~ 


NUMBER OF ON 
GREECE TAKES GOLD OLYMPICS PAPERS i 
The countries that have published the most Olympics 
research are the usual science powerhouses. But divide 
the number of Olympics papers by the total number of 
papers published by that country — and different nations 
take the lead, with Greece at the front of the pack. 

The Olympic Games date back to the eighth century BC, 
and Greek scientists are naturally proud of their heritage, 
says Minas Samatas, a political sociologist at the 
University of Crete in Rethymno, who studied the 2004 
Olympics in Athens. 

Norway boasts the second highest fraction of Olympics 
papers, and has won the most medals in the winter 
Games. Most of its 61 papers are about the winter Games 
or winter sports, especially skiing. 


THE DISCIPLINES COMPETE 


The social sciences have generated the most Olympics 


Using the Scopus database, Nature searched for papers — with medicine and engineering winning silver 
and bronze, respectively. 


articles that have “Olympics” or “Olympic Games” wie 4 ‘ . 

: : i , ce e Olympics are an “urban change-maker”, says 

in the title or “Olympic Games” in the abstract. sociologist Jacqueline Kennelly at Carleton University in 
Ottawa, Canada. They have led to expensive infrastruc- 
ture projects and placed huge demands on public trans- 
port. And those that have contended with world wars, 
protests, boycotts and terrorist attacks have generated 
substantial literature. 

Social scientists have also used the Games to study 
diverse topics such as the relationships between athletes 
and coaches (R. A. Philippe and R. Seller Psychol. Sport 
Exer. 7, 159-171; 2006) and how much the medal count 
influences national pride (I. van Hilvoorde et al. Int. Rev. 
Sociol. Sport 45, 87-102; 2010). 


ENVIRONMENTAL 
SCIENCE 


3] G 

C ITATI 0 N § | N TH E C ITY M. S. Friedman et al. Impact of changes in transportation and commuting behaviors during the 1996 citations 
Summer Olympic Games in Atlanta on air quality and childhood asthma. J. Am. Med. Assoc. 
285, 897-905 (2001). 

The paper that has generated the most 989 

citations focuses on the Atlanta 1996 Se) 

G dis foll del lyb bout D. G. Streets et al. Air quality during the 2008 Beijing Olympic Games. Atmos. Environ. 

ames, and Is To owed closely Dy one abou 41, 480-492 (2007). 
Beijing 2008. Both articles explore how 
policies such as increased provision of public I) 225 


ARTS AND 
HUMANITIES 


transportation can improve air quality. 

The fifth most highly cited paper analysed 
levels of enthusiasm about the 2000 
Olympics among different resident groups 
in the host city, Sydney. It is the most highly 
cited Olympics paper in the social sciences. 


W. Schanzer and M. Donike. Metabolism of anabolic steroids in man: synthesis and use of reference 
substances for identification of anabolic steroid metabolites. Anal. Chim. Acta 275, 23-48 (1993). 


S. Sarna et al. Increased life expectancy of world class male athletes. Med. Sci. Sports. Exerc. 
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The Cybathlon is a cyborg Olympics that will help disabled people 
to navigate the most difficult course of all: the everyday world. 


BY SARA REARDON 


ance Bergeron was once an amateur 

cyclist who rode 7,000 kilometres per 

year — much of it on steep climbs in 

the Alps. But in February 2013, as the 

50-year-old chemical engineer was 

biking to work at the Ecole Normale 
Supérieure in Lyons, France, he was hit by a 
car. The impact sent him flying through the air 
and onto his head, breaking his neck. When he 
woke, he learnt that he would never again move 
his legs on his own, and would have only limited 
use of his arms. 

Confined to bed for months while his body 
did what healing it could, Bergeron began 
to look for a way back to cycling. He started 
to study neuroscience, with an emphasis on 
research into robotic prostheses that could turn 
people like him into ‘cyborgs’: combinations 
of human and machine. He learnt that some 
of these prostheses used a technique known 
as functional electrical stimulation (FES) to 
deliver electrical signals to atrophied limbs or 
the stumps of missing ones, causing the muscles 
to contract and restoring some function. 

As soon as Bergeron had recovered enough 
to use a wheelchair, he took that idea back to 
the lab, where he switched his research focus 
to neuroscience. Using himself as a guinea pig, 
he and his team worked out how to stimulate 
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the nerves in his legs so that his muscles would 
flex and pedal a bike. “I have become my own 
research project and it’s a win-win,’ he says. 

Even with regular exercise sessions to build 
muscle, Bergeron’s artificially stimulated legs 
have produced at most 20 watts of power, barely 
one-tenth of the 150-200 watts produced by an 
average cyclist. But he and his team are building 
the FES controller and electrodes into a carbon- 
fibre recumbent tricycle that he hopes will help 
him to do better — and perhaps win a medal on 
8 October, when he takes his machine to Zurich, 
Switzerland, to race against other FES cyclists in 
the Cybathlon: the first cyborg Olympics. 


MACHINE LEARNING 

Around the world, nearly 80 research groups in 
25 countries are honing their technologies for 
the €5-million (US$5.5-million) event. They 
range from small, ad hoc teams to the world’s 
largest manufacturers of advanced prostheses, 
and comprise about 300 scientists, engineers, 
support staff and competitors: disabled people 
who will each compete in one of six events that 
will challenge their ability to tackle the chores 
of daily life. A race for prosthetic-arm users will 
be won by the first cyborg to complete tasks 
including preparing a meal and hanging clothes 
ona line. A powered-wheelchair race will test 


how well participants can navigate everyday 
obstacles such as bumps and stairs. 

The venue — Zurich's 7,600-spectator ice- 
hockey stadium — should combine with the 
presence of television cameras and team jerseys 
to give the Cybathlon a sporting vibe similar 
to that of the Paralympics, in which disabled 
athletes compete using wheelchairs, running 
blades and other assistive technologies. The 
difference is that the Paralympics celebrates 
exclusively human performance: athletes must 
use commercially available devices that run on 
muscle power alone. But the Cybathlon honours 
technology and innovation. Its champions will 
use powered prostheses, often straight out of 
the lab, and are called pilots rather than athletes. 
The hope is that devices trialled in the games 
will accelerate technology development and 
eventually be used by people around the world. 

The everyday tasks being tested in the 
Cybathlon are much more difficult than they 
seem, says Robert Riener, a biomedical engi- 
neer at the Swiss Federal Institute of Technol- 
ogy in Zurich and creator of the Cybathlon. “I 
think that people are spoiled by the Internet and 
Hollywood movies,” he says. “We want to show 
people there are still challenges.” 

Riener traces the origins of the Cybathlon 
back to news accounts of a charity event: in 
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November 2012, a man called Zac Vawter, 
who had lost a leg in a motorcycle accident, 
used an experimental, motorized prosthetic 
leg to climb the 103 storeys of Willis Tower in 
Chicago, Illinois, in just 45 minutes. 

The feat impressed the media — and Riener. 
But it also frustrated him: Vawter’s device, 
along with similarly impressive prostheses 
from Riener’s lab and others around the world, 
were not reaching people who needed them. 
“We're doing great work but not selling it 
well,” he thought. So why not take inspiration 
from the Willis Tower stunt, and draw atten- 
tion to the technology through a competition 
open to everyone in the prosthetics-research 
community? 

Riener’s 30-person lab team was excited about 
designing such an event. And before long, word 
had spread to colleagues around the world. 

At first, Riener had considered hosting 
showy events such as climbing a mountain on 
prosthetic limbs. But he changed his mind in 
2013, after talking to an acquaintance who had 
lost an arm to cancer and wore a prosthesis. 
The device ended in a hook that was moved 
by cables when the man flexed certain muscles 
in his stump. It worked well enough for large 
movements, but was hopeless for fine con- 
trol. Once, the man told Riener, he had been 
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Pilot Matt Standridge will compete in the Cybathlon using an exoskeleton designed to help people with paraplegia to walk. 


buying cinema tickets, and could feel the peo- 
ple queuing behind him staring and growing 
impatient as he struggled to draw out his wallet 
and grasp the pieces of paper. 

These mundane challenges, Riener realized, 
were greater and more meaningful than the 
need to design, say, a spring-like leg that simply 
helps someone to run fast. So he decided that 
most of the Cybathlon competitions would be 
distinctly non-Olympian. 


BRAIN POWER 

Easily the strangest will be the brain—-computer 
interface (BCI) race, which will feature 15 pilots 
sitting still for 4 minutes while large screens in 
the arena show what is going on in their heads. 
Each will attempt to guide an on-screen char- 
acter through an obstacle course using specific 
patterns of brain activity, translated by an elec- 
trode cap into three commands: accelerate, 
jump over spikes or roll under laser rays. 

In principle, the patterns can be anything. At 
the University of Essex in Colchester, UK, for 
instance, a team of current and former students 
led by postdoc Ana Matran-Fernandez has 
designed an algorithm that associates the three 
motions with a pilot thinking of his or her hand 
or foot, or working through a maths equation. 

The electrical signals are weak, and each 
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individual is different, so it can be difficult to 
distinguish between the commands — espe- 
cially when a pilot is distracted, for example 
by cheering and adrenaline in the competi- 
tion. Constantly thinking about tasks is men- 
tally exhausting, says neuroscientist José del 
R. Millan of the Swiss Federal Institute of Tech- 
nology in Lausanne, whose team is working on 
ways to predict thought patterns to make the 
association more natural and let the pilot relax. 

BCls will probably never be used for real 
jumping and running, because detecting elec- 
trical activity in muscles is much easier. But if 
such devices could be made cheap and accu- 
rate enough, they could help disabled people to 
guide wheelchairs, cursors or even Skype-ena- 
bled robots that would let them participate vir- 
tually in an event. “The fact that you can develop 
this in the lab and bring it out and see it works 
means there’ a future,” says Matran-Fernandez. 

Other Cybathlon events will highlight the 
great strides being made with more-conven- 
tional devices. In the prosthetic-leg race, com- 
petitors must get past obstacles such as stairs, 
randomly placed stones, tilted pavements and 
doors — not to mention sitting down in a chair 
and standing up again. Several participants 
will be using state-of-the-art smart knees and 
ankles that can detect force and acceleration as 
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they walk, and correct their motion 
if they start to fall. 

But even the most advanced 
engineering pales beside what the 
intact body does naturally. When a 
person picks up a pen with a flesh 
and blood arm, their brain and 
peripheral nervous system coordi- 
nate how far to reach, how to bend 
each joint in each finger into a pre- 
cise shape, and how hard to grasp 
— all without conscious effort. 
Standard movable prostheses, 
such as the type with the hook and 
cables, require the user to do all of 
this consciously. This takes a great 
effort, which is one reason many 
amputees choose not to wear them. 

To get around that, research- 
ers have to create computer algo- 
rithms that decode signals from muscles and 
nerves and predict what a wearer is trying to do. 
In Burnaby, Canada, a Cybathlon team called 
MASS Impact is working with pilot Danny 
Letain, a former Canadian Paralympic skier 
who lost his left arm in a 1980 railway accident. 
The team has built an arm with a panel of flat 
buttons that sits on Letain’s arm stump. 

Using his memory ofa hand, Letain imagi- 
nes making one of 11 gestures, such as point- 
ing. The muscles in his stump then compress the 
buttons and tell his artificial hand to do what he 
intends. Letain was pleased to find that the brain 
circuitry that once controlled his fingers is still 
in working order, long after he stopped feeling 
any ‘phantom pair’ in his lost arm. “I'm using 
something I haven't used in 35 years,” he says. 

Some arms are even more advanced. A 
team led by biomedical engineer Max Ortiz 
Catalan at Chalmers University of Technol- 
ogy in Gothenburg, Sweden, has developed a 
two-way prosthetic hand that can feel as well 
as move (M. Ortiz-Catalan et al. Sci. Transl. 
Med. 6, 257re6; 2014). The arm is permanently 
implanted in the wearer's bone, and uses up to 
nine electrodes to convey motor commands 
from the remaining muscles to the prosthesis, 
and to send signals from sensors in the fingers 
back to the arm’s sensory nerves. Cybathlon 
pilot Magnus Niska is the only person in the 
world who wears such a prosthesis outside the 
lab. Ortiz Catalan hopes that the ability to feel 
objects will give Niska a competitive advantage. 

A team led by Ronald Triolo, a biomedical 
engineer at Case Western Reserve University in 
Cleveland, Ohio, has a similar strategy for the 
FES cycling event, in which contestants with 
spinal-cord injuries will pedal for 750 metres 
around a circular track. Many of the com- 
petitors, including Bergeron, use electrodes 
on the skin to stimulate the leg muscles. But 
the Cleveland system — originally designed 
to allow people with lower-limb paralysis to 
walk with the help of crutches — features elec- 
trodes surgically implanted in the leg muscles. 
Using an external device, the wearer chooses a 
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Functional electrical stimulation helps people with spinal injuries race bikes. 


menu option, such as ‘sit. An implanted pulse 
generator activates the electrodes that cause the 
muscles to contract in the correct order. 

After Triolo heard about the Cybathlon, he 
realized that he could add cycling to his volun- 
teers’ exercise regimes. His team has equipped a 
recumbent tricycle with sensors that detect the 
angle of the cyclist’s leg as he or she pedals, and 
automatically change the stimulation patterns 
so that one leg pushes while the other pulls. 

Triolo says that all 27 of the people implanted 
with his electrodes want to try cycling. After 
putting them through qualifying trials, he is 
down to a few finalists for the Cybathlon. “We 
want to go to Switzerland and win this thing,” 
he says. “Then Id like to use that as a spring- 
board to build an exercise programme here,” 


EYES ON THE PRIZE 

This competitiveness is a far cry from Triolo’s 
initial reaction to the Cybathlon, which was that 
the competition was a foolish idea. “We should 
find a way to collaborate internationally on 
these problems rather than compete, he recalls 
saying — something that he and Riener say they 
still hear from some in the prosthetics field. 

Triolo eventually came around: he decided 
that the Cybathlon would at least be a good 
learning experience. Riener himself hopes that 
bringing the competition into the open will 
spur creativity better than the conventional aca- 
demic process, which is hampered by research- 
ers concerns about their intellectual property 
and competitiveness for grants. 

Karim Lakhani, an economist who studies 
innovation at the Harvard Business School in 
Boston, Massachusetts, notes that competi- 
tions also force researchers to finish their work 
quickly and eliminate doubts about feasibil- 
ity that prevent them from starting in the first 
place. He points to the self-driving car, which 
languished in development for decades until 
2005, when the US Defense Advanced Research 
Projects Agency held a race with a $2-million 
prize. The contest eventually drew interest 
from Google, which is now testing such cars on 


the roads. “This contest will serve 
the same way,’ says Lakhani. The 
Cybathlon will not award monetary 
prizes, just medals. But his research 
suggests that the recognition 
enjoyed by the winners could be just 
as motivating (K. J. Boudreau et al. 
RAND. Econ. 47, 140-165; 2016). 

Perhaps the greatest advantage 
of prizes is that they give unknown 
contestants an opportunity to com- 
pete alongside big, well-known 
players, says Lakhani. The Cybath- 
lon has drawn plenty of both. Otto 
Bock HealthCare, a multibillion- 
euro company based in Duderstadt, 
Germany, and the world’s largest 
manufacturer of prosthetic limbs, 
has entered three Cybathlon events. 
One is the powered-exoskeleton 
race, in which contestants with spinal injuries 
will use an external support system to navigate 
obstacles similar to those in the powered-leg 
race. Otto Bock’s pilot, Lucia Kurs, lost the use of 
her legs to spinal tumours. Now in her 60s, she 
can walk 12 kilometres using the firm’s commer- 
cial leg brace, which has sensors, electronics and 
motors to guide the knees and ankles through a 
normal leg swing. 

“We're showing off, and checking out other 
manufacturers” at the Cybathlon, says Christof 
Kiispert, a product manager at Otto Bock. But 
he says the company is also interested in learn- 
ing about innovative prototypes from dark- 
horse developers at universities. 

One smaller developer is Jess Tamez- 
Duque, managing director of the start-up INDI 
Engineering and Design in Monterrey, Mexico, 
who is entering a prototype for an exoskeleton 
much cheaper than Otto Bock’s US$75,000 
model. The device's joints are moved by wind- 
screen-wiper motors, and much of the body is 
3D printed. INDI’s competitor uses a joystick 
attached to his crutches to choose from several 
programmed movements, such as climbing 
stairs or sitting. 

Tamez-Duque hopes that the Cybathlon will 
attract collaborators and prove that Mexico can 
be a player in the field. “The way we see it, the 
Cybathlon is a competition that concentrates 
the top-notch robotics labs in the world,” 
he says. “We're still working on getting that 
representation so that other people believe we 
can actually add value to them?” 

The Cybathlon will be back in 2020, as a 
seven-day event in Tokyo, coinciding with the 
Olympics. It will have new events for competi- 
tors with visual impairments, and will conduct 
some races outside the stadium. But for com- 
petitors in the first Cybathlon, being at the cut- 
ting edge is already a thrill. “This is Iron Man, 
this is Avatar,’ says Bergeron. “It’s a combination 
of BCI and exoskeletons all over the place.” m 


Sara Reardon is a staff reporter for Nature in 
Washington DC. 
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An electronic-waste recycling factory in Hubei, China. 


Take responsibility for 
electronic-waste disposal 


International cooperation is needed to stop developed nations simply 
offloading defunct electronics on developing countries, argue Zhaohua Wang, 


he world is producing ever more 
ik and electronic waste. 
The quantity of dumped comput- 
ers, telephones, televisions and appli- 
ances doubled between 2009 and 2014, to 
42 million tonnes per year globally’. 
Developed countries, especially in 
North America and Europe, produce 


Bin Zhang and Dabo Guan. 


the most e-waste (see ‘Unfair flow’). The 
United States generates the largest amount, 
and China the second most’. 

Much of this waste ends up in the 
developing world, where regulation is lax. 
China processed about 70% of the world’s 
e-waste in 2012*; the rest goes to India and 
other countries in eastern Asia and Africa, 


including Nigeria’. Non-toxic components 
— such as iron, steel, copper and gold — 
are valuable, so are more frequently recy- 
cled than toxic ones*. Disposal plants 
release toxic materials, volatile organic 
chemicals and heavy metals, which can 
harm the environment and human health. 
Lead levels sampled in the blood 


4 AUGUST 2016 | VOL 536 | NATURE | 23 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


> of children in the e-waste-processing 
town of Guiyu, China, were on average 
three times the safe limit recommended 
by the US Centers for Disease Control 
and Prevention’®. In California, peregrine 
falcons have been threatened — polybro- 
minated diphenyl ethers, which are widely 
used as flame-retardants in electronics, 
have been discovered in their eggs. 

A global approach to managing the 
volume and flow of e-waste is urgently 
needed. This requires: an international 
protocol on e-waste; funding for technol- 
ogy transfer; firmer national legislation on 
imports and exports; and greater aware- 
ness of the problem among consumers. 
Researchers and regulators should build a 
global e-waste flow system that covers the 
whole life cycle of electrical goods, includ- 
ing production, usage, disposal, recovery 
and remanufacturing. 

Beyond better recycling, the ultimate 
aim should be a circular economy of 
cleaner production and less wasteful 
consumption, including the embrace 
of a sharing economy and cloud-based 
technologies with smaller material foot- 
prints. As the world’s largest producer of 
electronic goods and recipient of the most 
e-waste, China should take the lead. 


BAD RUBBISH 

Most developed countries have strict 
regulations governing the disposal of 
electronic and electrical waste. European 
countries, the United States and others 
have official ‘take-back systems, which 
recover and dispose of e-waste in an envi- 
ronmentally friendly way. In 2014, these 
processed 6.5 million tonnes generated 
by 4 billion people, recycling valuable 
materials back into the supply chain. The 
European Union has two comprehensive 
directives: the Restriction of Hazard- 
ous Substances and Waste Electrical and 
Electronic Equipment. Yet the EU and the 
United States and Canada dispose domes- 
tically of only 40% and 12%, respectively, 
of the e-waste they generate’. 

These rich nations with strict legisla- 
tion send most of their e-waste to devel- 
oping countries. India and China's e-waste 
legislation is inefficient and irregularly 
enforced. China's system is poorly coor- 
dinated; it involves more than ten depart- 
ments publishing regulations, imposing 
disposal fees, providing subsidies and 
monitoring pollution and illegal imports 
with little crosstalk. Many poor nations, 
especially in Africa, have few or no laws 
on e-waste. 

Around half the components in any per- 
sonal computer contain mercury, arsenic 
and chromium — all are toxic. The move- 
ment of this waste in and out of countries 
is not being tracked. The Basel Convention 
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of the United Nations, which concerns the 
movement of hazardous waste across bor- 
ders, is meant to prevent developed coun- 
tries from illegally dumping hazardous 
waste in developing countries. But only 
87 parties — and not the United States — 
have ratified it. Few developing countries 
control imports of toxic e-waste: for exam- 
ple, India’s law fails to ban it. This resulted 
in 50,000 tonnes of such waste from devel- 
oped countries being dumped in India 
in 2012’. The shady trading of trash as 
‘used electronics’ 


bypasses such laws “Personal 
entirely. computers 
In many devel- contain 
oped countries, mercury, 
such as those in arsenic and 


chromium — all 
are toxic.” 


the EU, manufac- 
turers are required 
to take responsi- 
bility for the disposal of their electrical and 
electronic products. However, three-quar- 
ters of products sold in Europe are made 
in developing countries such as China and 
India. So such measures only worsen the 
situation in poor nations. 


UNCERTIFIED DISPOSAL 

A few developing countries, including 
China, have made producers responsi- 
ble for some disposal. Since January 2011, 
Chinese producers have had to pay disposal 
fees for five categories of home appliance 


worker dismantles a motherboard by hand in Sangrampur, India. 


wa 


(televisions, air conditioners, refrigerators, 
washing machines and computers). The 
list grew to 14 in March this year. But the 
scheme pays only for e-waste processing, not 
collection — so e-waste is just not collected. 

China has 106 enterprises certified by 
the government as capable of dismantling 
100 million defunct home appliances per 
year. Together, these companies process 
only 40 million items. The rest is recov- 
ered by unskilled peddlers going door- 
to-door®. There are 300,000 such people 
in Beijing alone. They sell to uncertified 
disposal plants’; these pay higher prices 
than the certified ones, which have larger 
overheads. Elsewhere — in Guiyu, for 
example, or in Agbogbloshie in Ghana— 
people sift e-waste from household rub- 
bish and sell it for uncertified disposal. In 
China, this ‘grey’ market is estimated to 
be worth US$15 billion. Large amounts of 
government subsidies intended for such 
disposal lie idle. In 2013, only 630 million 
yuan (US$94 million) of the 2.81 billion 
yuan available was spent. 

By contrast, EU-based manufacturers are 
encouraged, through legislation, to support 
the dismantling of their products and the 
recovery of materials through environmen- 
tally friendly product design’®. In Japan, 
producers must collect used devices; con- 
sumers are required to deliver their e-waste 
to the manufacturer or to certified collec- 
tion sites, and pay a compulsory recycling 
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UNFAIR FLOW Most electronic waste from developed countries ends up in poor nations that lack regulation. China processed around 70% of 
the world’s e-waste in 2012; the rest goes to India and other countries in eastern Asia and Africa, including Nigeria. 


The United States 
produces the largest 
total amount of 
e-waste per year, at 
7.1 million tonnes. 


42 MILLION TONNES 


E-waste generated 
each year 


WORLD ASIA 
41.8 million tonnes 16 million 


fee to the waste-disposal company. 

Globally, only 6.5 million tonnes of 
e-waste (about 15% of the total) were for- 
mally reported as disposed of through 
national take-back systems in 2014. In 
China, the proportion is 24-30%’. The 
rest is sent either to landfill or to the black 
market. 


FOUR STEPS 
The following four steps need to be taken 
to make e-waste management sustainable. 

First, a formal global protocol on 
e-waste trading needs to be built under the 
Basel Convention, and the United States 
must be encouraged to participate. The 
convention currently covers only the trad- 
ing of toxic waste; it should be extended 
to encompass e-waste and second-hand 
electronic products. Strict criteria must 
be agreed globally to distinguish products 
by durability, usability and safety. 

Second, domestic regulations need 
strengthening and enforcing: those operat- 
ing illegally should be fined or prosecuted. 
Developed countries must crack down on 
defunct products being traded as used ones. 
Developing countries must ban imports of 
toxic e-waste. Customs duties on e-waste 
should be increased. 

Third, the United Nations’ Solving the 
E-waste Problem Initiative must take 
on many more roles. It should launch 
a global industry association to certify 


Norway generates the 


E-waste flow 


African nations l 
produce little e-waste, 
with Equatorial 

Guinea creating most 
(10.8 kg per capita). 


most e-waste per person, 
at 28.3 kg per capita. 


" China ranks second for 
total e-waste generation 
(6 million tonnes), but low 
relative to its population 


size (4.4 kg per capita). 


(e) 28.3 
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AMERICAS EUROPE AFRICA OCEANIA E-waste generation in 2014 
11.7 million 11.6 million 1.9 million 0.6 million (kilograms per capita) 


processing firms that meet agreed legal, 
technical and environmental criteria. 
It should encourage the transfer of pro- 
cessing and recycling technology from 
developed to developing nations. It 
should create a global e-waste disposal 
fund to which exporting countries and 
manufacturers would contribute for each 
product they sell. 

Any country responsible for disposal 
should receive a fee that is 2-5% of the 
original production cost, and ensure that 
an appropriate and verifiable disposal 
procedure is implemented. Certified 
firms would get subsidies from the fund 
according to how much they process. 
The same industry body should launch a 
global monitoring system to track flows 
around the world over the whole life cycle 
of e-products. Components such as circuit 
boards and the compressors in refrigera- 
tors and air-conditioners could be labelled 
with radio-frequency identification tags. 

Fourth, consumers’ responsibility for 
e-waste needs to be enshrined in regula- 
tions, taking lessons from Japan. Separate 
e-waste bins should be provided, with 
penalties for those who do not use them. 
Deposit mechanisms could be used when 
purchasing electrical goods, and people 
can get the money back when they send 
their waste to certified collectors. 

It is time for consumers, researchers, 
manufacturers, nations and international 


regulators to direct some of the passion and 
creativity they have for new gadgets towards 
responsibly dealing with old ones. = 
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Austerity measures in Greece — part of the eurozone — have sparked unease among pensioners. 


Singular currency 


Jonathan Portes parses Joseph Stiglitz’s analysis of the 
euro in the context of the global financial crisis. 


ho killed the euro? With the 
spotlight now on Britain’s rela- 
tionship with the currency’s 


crucible, the European Union, that question 
has an edge. Were the culprits the visionary 
politicians who ignored structural differ- 
ences between European economies when 
they conceived the euro in the 1980s — most 
notably Jacques Delors, president of the 
European Commission at the time? Was it 
the neoliberal international economic and 
financial establishment, in Washington DC, 
New York and London as well as Frankfurt 
and Brussels, that thought markets (even 
financial ones) were rational and self-cor- 
recting? Was it Jean-Claude Trichet, gover- 
nor of the European Central Bank during the 
2008 financial crisis, who made the people of 
Ireland pay for the corruption of their bank- 
ers by forcing their government to stand 
behind their broken banks? Or perhaps the 
putative ‘Swabian housewife’ often invoked 
by German Chancellor Angela Merkel, who 
balances her budget and cannot see why the 
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same logic does not 
apply to governments? 
In the first chapter 
of economist Joseph 
Stiglitz’s The Euro, we 
learn that they all did 
it. And that it is all the 
fault of contemporary 
capitalism. Most of the 
book is devoted to this 
topic, and it serves as 
a useful guide to the 
many failures of euro- 
zone policymakers 
in the past quarter 
of a century. Stiglitz speculates boldly and 
cogently about the zones future, but The Euro 
is on the whole too scattershot either to diag- 
nose the illness or to prescribe a viable cure. 
Stiglitz’s thesis — or rather, loose assem- 
blage of theses — explains too much. The 
gross domestic product of the eurozone is 
now barely higher than it was immediately 
before the crisis, and the euro is indeed 


The Euro: Howa 
Common Currency 
Threatens the 
Future of Europe 
JOSEPH E. STIGLITZ 

W. W. Norton: 2016. 


central to explaining why. There is, however, 
a lively debate over the inevitability of this 
failure. Some argue that the euro was always 
doomed because of divergences in income 
and economic structures between eurozone 
countries, and the lack of a US-style politi- 
cal union to make significant fiscal transfers 
possible. Sooner or later, a big shock — a 
rapid rise in unemployment, a sharp fall 
in capital inflows — would hit one country 
or more, and without being able to devalue 
their currency, nations would be unable to 
adjust without excessive economic damage. 
It was bad luck that when the shock did come 
(in 2008) it was as big as it was, especially for 
countries such as Spain and Ireland. It was 
also both predictable and predicted. 
Others, such as economics journalist 
Martin Sandbu in his excellent Europe’ 
Orphan (Princeton Univ. Press, 2015), 
argue that the eurozone’s dismal economic 
performance is a result less of the currency’s 
design than of a set of avoidable and entirely 
unnecessary policy failures. These include 
premature and excessive austerity measures, 
especially in southern European countries, 
and slowness in restructuring debts, in par- 
ticular the Greek government. Stiglitz leans 
towards the view that failure was inevitable. 
Unlike Sandbu, he doesn't provide much in 
the way of evidence one way or the other. 
Similarly, he fails to explain what he 
thinks has happened outside the eurozone. 
It's true, as he shows, that the eurozone’s pro- 
ductivity performance since 2008 has been 
abysmal. But productivity has grown even 
more slowly in Britain (which held onto its 
own currency), although employment has 
held up better. A more nuanced account 
would distinguish between two sets of fac- 
tors. That is, those that have been present 
in most advanced countries to some degree 
(including excessive austerity and the still- 
unexplained productivity slowdown), and 
those specific to the eurozone and its institu- 
tions — in particular, the inability to adjust 
exchange rates, and the lack ofa central bank 
able to act as a credible lender of last resort. 
Stiglitz also speculates about the near 
future. Unsurprisingly, given the shopping 
list of flaws that he sees in the euro, he thinks 
that radical reform is necessary for it to sur- 
vive. That includes fiscal policy that is both 
more expansionary — with greater scope 
to increase public investment — and more 
countercyclical, so that spending can rise 
rather than fall in a recession. It would also 
include greater regulation, especially for the 
financial sector and a new mandate for the 
European Central Bank that focuses less heav- 
ily on price stability and more on growth and 
employment. Yet Stiglitz fails to distinguish 
clearly between reforms that are essential to 
save the euro and those that would be ‘nice 
to have} and that, I would bet, he would rec- 
ommend equally to non-eurozone countries 
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such as Britain or the United States. 

If such reforms do not happen, Stiglitz 
recommends either an “amicable divorce” 
— dissolution of the eurozone — ora 
move to amore “flexible euro”. This is the 
most innovative and interesting part of 
the book. He argues for a new, and much 
more heavily regulated and controlled, 
international monetary system. States 
would have much more direct control of 
both money creation internally and their 
current account balances externally. He 
advocates market-based mechanisms for 
both. “Credit auctions” would demand 
that private banks pay for the right to 
expand the money supply, and “trade 
chits” would force importers to effec- 
tively buy tradeable licences to import. 
Nevertheless, this would be a radical 
move towards greater state control of the 
economy. 

This is potentially very exciting. Few 
would defend the current organization 
of the international financial system, 
and radical ideas based in sound eco- 

nomics are 


“Radical ideas exactly what we 
basedin sound need. The state 
economics are control that 
exa ctly what we Stiglitz is advo- 
need.” cating would 

be viewed with 


scepticism 
at the International Monetary Fund or 
the US Treasury, but would no longer be 
regarded as laughable or heresy. And a 
proposal from someone of Stiglitz’s emi- 
nence has a good claim on our attention. 
Unfortunately, this part of the book is 
underdeveloped. For example, it seems 
unlikely that as good an economist as 
Stiglitz hasn’t thought about how the 
spread of shadow banking — borrowing 
and lending outside the traditional bank- 
ing system — has made it much more dif- 
ficult to control credit creation. And he 
must be aware of the administrative prob- 
lems that trade in services (particularly 
tourism) would pose for his chit system. 
But these issues are not addressed. 

Will the eurozone respond to Britain's 
vote to leave the EU with a rapid move 
towards greater integration, or will the 
tensions identified by Stiglitz pull it apart? 
Perhaps, as has mostly been the case so far, 
it will continue to muddle through. But 
it cannot avoid the questions that Stiglitz 
poses for ever — even if he is a long way 
from providing convincing answers. m 


Jonathan Portes is principal research 
fellow at the UK National Institute of 
Economic and Social Research in London. 
He was chief economist for the UK Cabinet 
Office during the 2008-09 financial crisis. 
e-mail: j.portes@niesr.ac.uk 
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A cognitive case for 
un-parenting 


Josie Glausiusz relishes Alison Gopnik’s study on how 
child-rearing demands the embrace of messy realities. 


n Amazon trawl for “parenting books” 
A‘ month offered up 186,262 results. 

Titles included Daniel Siegel and 
Tina Payne Bryson’s The Whole-Brain Child: 
12 Revolutionary Strategies to Nurture Your 
Child’s Developing Mind (Delacorte, 2011), 
Elaine Glickman’s Your Kid’ a Brat and It’s All 
Your Fault (TarcherPerigee, 2016) and Havea 
New Kid by Friday by Kevin Leman (Revell, 
2012). This is less genre than tsunami. 

Yet, as Alison Gopnik notes in her deeply 
researched book The Gardener and the Car- 
penter, the word parenting became common 
only in the 1970s, rising in popularity as tradi- 
tional sources of wisdom about child-rearing 
— large extended families, for example — fell 
away. Gopnik, a developmental psycholo- 
gist (or as she describes herself, “a bubbe at 
Berkeley, a grandmother who runs a cognitive 
science laboratory”), argues that the message 
of this massive modern industry is misguided. 

It assumes that the ‘right’ parenting tech- 
niques or expertise will sculpt your child 
into a successful adult. But using a scheme to 
shape material into a product is the modus 
operandi of a carpenter, whose job it is to 
make the chair steady or the door true. There 
is very little empirical evidence, Gopnik says, 
that “small variations” in what parents do 
(such as whether they sleep-train) “have reli- 
able and predictable long-term effects on who 
those children become”. Raising and caring 


Children learn well from undirected play. 


for children is more 
like tending a garden: 
it involves “a lot of 
exhausted digging and 
wallowing in manure” 
to create a safe, nur- 
turing space in which 
innovation, adaptabil- 
ity and resilience can 


The Gardenerand _ thrive. Her approach 
the Carpenter: focuses on helping 
What the New children to find their 
Science of Child own way, even if it 
Development been dch 
Tells Us About isnt one youd choose 
the Relationship for them. The lengthy 
BetweenParents childhood of our spe- 
and Children cies gives kids ample 
ALISON GOPNIK _ opportunity to explore, 
: on : Straus & Giroux: exnloit and experiment 
; before they are turned 


out into an unpredictable world. 

In Gopnik’s not-parenting approach, the 
rampant disorder of genetic variation (or, 
to use her technical term, “mess”) becomes 
a wellspring for creativity, contributing to 
the wide range of children’s temperaments 
and abilities. Some children are risk-takers; 
others are timid; some are highly focused (an 
advantage in a test-obsessed school system) 
or natural hunters (“constantly on the alert 
for even subtle changes in the environment”). 
Throughout history, she argues, that mix has 
bred resilience in societies faced with chal- 
lenges, such as early nomads’ constant need 
to confront new environments. People with 
more conservative temperaments, for exam- 
ple, ensure some security for the risk-takers. 

Gopnik reveals how the parenting model 
can affect how children explore. She describes 
a wide range of experiments showing that 
children learn less through “conscious and 
deliberate teaching” than through watching, 
listening and imitating. Among the K’iche’ 
Maya people of Guatemala, even very young 
children with little formal schooling can 
master difficult and dangerous adult skills — 
suchas using a machete — by watching adults 
engaging in these tasks in slow and exagger- 
ated fashion. In one of Gopnik’s own experi- 
ments using a “blicket detector” (a box that 
lights up and plays music when a certain com- 
bination of blocks is placed on it) four- and 
five-year-olds worked out that unusual > 
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combinations rather than individual 
blocks did the trick — and younger kids 
were more skilled than older ones at 
finding unlikely options. 

She also cites a number of studies on 
play, which is so crucial to human devel- 
opment that children engaged in it even 
in Nazi concentration camps. Research 
on dolphins, crows and foxes reveals how 
playing at hunting, digging and fighting 
develops the skills the animals need as 
adults. Through play, young rats produce 
chemicals called cholinergic transmit- 
ters, implicated 
in plasticity in 
‘social’ areas of 
the brain. Rats 
deprived of play 
when young can 
defend, attack or 
approach others 
as adults, but 
fail to know “when to do what’, she notes. 
Most human parents, Gopnik writes, 
“have a vague sense that play is a Good 
Thing”. But as an aim of parenting, play 
is paradoxical, she claims, because it is 
essentially goalless. Elizabeth Bonawitz, 
a researcher in computational cognitive 
development, found that when adults 
instructed children on how to play with 
a squeaking toy, the children imitated 
them. When left to their own devices, the 
children were more likely to try different 
actions until they had discovered every- 
thing the toy could do. 

Gopnik can be scathing in her censure 
of the modern educational system, which 
increasingly stresses high-stakes testing. 
That trend, she notes, parallels the rise in 
diagnoses of attention deficit hyperactiv- 
ity disorder (ADHD), which in the United 
States particularly is often treated with 
drugs that can have serious side effects, 
including addiction. More palpable, how- 
ever, is her devotion to the subjects of her 
research, including her grandchildren 
Augie and Georgie, her “true muses’, 
whose antics pepper her text. 

Those antics remind me of my own 
delightfully disorderly, creative five-year- 
old twins and their in-the-now mischief 
and affection. As Gopnik concludes: 
“The most important rewards of being a 
parent aren't your children’s grades and 
trophies — or even their graduations and 
weddings. They come from the moment- 
by-moment physical and psychological 
joy of being with this particular child, and 
in that child’s moment-by-moment joy in 
being with you.” m 


Josie Glausiusz writes about science and 
the environment for magazines including 
Nature, National Geographic and Hakai. 
Twitter: @josiegz 
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The Three Gorges Dam on the Yangtze River is one of the world’s largest power stations. 
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A hydrological 


history 


Andrea Janku enjoys a study of the nation-building 
role of China’s great rivers, the Yellow and the Yangtze. 


early 70 years ago, Chinese anthro- 
Neve Fei Xiaotong published 

From the Soil (1947). The Chinese 
people, he wrote, were “inseparable from the 
soil’, which had produced “a glorious history’, 
but one that was “limited by what could be 
taken from the soil” If that book was the por- 
trait of a rural and inward-looking country, 
literally stuck in the famous yellow earth — 
the loess of the North China Plain — science 


writer Philip Ball’s history of China, The 
Water Kingdom, is very much the opposite. 
Itis the portrait of a civilization permeated 
by water, with patterns of thought influenced 
by the centrality of water to everyday life and, 
echoing that, practical affairs shaped by philo- 
sophical ideas based on the principle of flow. 
The result is, Ball writes, “an intimate con- 
nection between hydraulic engineering, gov- 
ernance, moral rectitude and metaphysical 
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speculation that has no parallel anywhere in 
the world” On this premise he builds a picture 
of the nation, from its geographical and ideo- 
logical foundations to the environmental and 
political predicament in which China (and 
not only China) now finds itself. 

The Water Kingdom’s structure is predom- 
inantly thematic rather than chronological. 
So the first chapter, introducing the Great 
Rivers, the Yangtze and the Yellow, leads 
from the Great Yu, the mythical ruler who , 
according to tradition conquered the floods 
more than 4,000 years ago, to twentieth- 
century Communist leader Mao Zedong, 
who repeatedly reasserted his power by 
swimming in the Yangtze. It interweaves 
more stories of the Yangtze: seventeenth- 
century explorer Xu Xiake’s search for 
its source, twelfth-century poet Lu You’s 
descriptions of commercial life along its 
banks, and more recent Western visitors’ 
accounts of the colossal Three Gorges Dam. 

On this epic journey, Ball explores 
mythological accounts of dragons and floods, 
along with early philosophical texts such as 
the teachings of Mencius from the fourth 


century BC, to unravel the origins of Chinese 
political ideology. That is, the idea that he 
who controls the water controls the people, 
which links the earliest cultural heroes to 
modern leaders from Republic of China 
founder Sun Yat-sen to Mao himself. Ball 
subsequently covers Zheng He's maritime 
explorations at the height of Ming-dynasty 
power in the fifteenth century, and the 
centrality of the Yellow River-Grand Canal 
administration to the state bureaucracy in 
the eighteenth. He notes how the late empire 
turned into a “hydraulic state’, increasingly 
mired in systemic problems that finally 
collapsed under the pressure of internal 
rebellion and the imperialist onslaught. 

The centrality of water even plays out in 
the art of war, to which Ball devotes a chap- 
ter. In some of the most dramatic conflicts, 
rivers were harnessed as weapons. In 204 Bc, 
for instance, an intentional rupturing of the 
Wei River dams led to the victory of the Han- 
dynasty forces. And in 1938, the Nationalist 
government attempted to stop the advancing 
Japanese army by breaching the Yellow River 
dykes, with disastrous consequences for the 
Chinese people — killing hundreds of thou- 
sands and making millions homeless. Tech- 
nological and political parameters changed 
fundamentally in the twentieth century, yet 
hydraulic nation-building and the myths 
that surrounded it assumed an ever more 
important role. Mao in particular relished 
the role of the great leader conquering the 
floods. Ball ends that strand of the narrative 
with the Three Gorges Dam, first conceived 
by Sun Yat-sen and finally completed in 
2012. He even covers the depiction of water 
in Chinese art through the ages, exploring its 
aesthetic, philosophical and political dimen- 
sions. The journey ends with a pertinent 
chapter on China’s current environmental 
crisis. Another hydraulic-engineering project 
on an unprecedented scale, the South-North 
Water Transfer Project, is now under way, 
meant to tackle water scarcity in the north 
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Mao Zedong by the Yangtze. 
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——— = (J. Barnett et al. Nature 
527, 295-297; 2015). 

Telling the history 
of Chinese civilization 
from the perspective 
of water is rewarding, 
«- because it can link the 

iy history of ideas and 

——~ SJ beliefs, technology and 
The Water warfare, politics and 
Kingdom: A Secret the arts. But as with 
History of China any general history, 
PHILIP BALL it risks essentializing 
Bodley Head: 2016. China and making 
its history seem more 
uniform than the actual record justifies. The 
most obvious example is the major shift in the 
history of the Yellow River in the late tenth 
century. It is only from then that the river 
became a constant threat, bursting its dykes 
and flooding the countryside in ever more 
devastating cycles, and changing its course 
repeatedly in dramatic ways after nearly a 
millennium of relative stability. And so it is 
also only from then that controlling the river 
became tantamount to controlling the people, 
and that state and society became trapped in 
an increasingly unsustainable hydraulic infra- 
structure. That complex system of dykes and 
canals, with the Yellow River and the Grand 
Canal at its heart, devoured enormous 
resources — a quandary called “technological 
lock-in” by historian Mark Elvin. Moreover, 
Ball’s focus on the state means that he fails 
to mention the role of small-scale irrigation 
and conservation projects that are common in 
particular in the southern China, and largely 
managed and funded by the local gentry. 

Still, this is a convincing introduction to 
Chinese history. Rather than perpetuating 
stereotypes, it boldly navigates the treacher- 
ous and often-avoided terrain long dominated 
by influential but spurned theories, such as 
the idea, promoted by sinologist Karl August 
Wittfoge, of China as a despotic hydraulic 
society. It also complements and complicates 
Fei Xiaotong’s idea of an earthbound civiliza- 
tion — a metaphor that has had a huge impact 
in China itself. In 1988, the six-part Chinese 
television documentary River Elegy depicted 
the country as weak and backward, closed off 
from the world by the Great Wall and stuck in 
the mud of the Yellow River, contrasted with a 
progressive, open, oceanic conceptualization 
of Western civilization. It has taken a general- 
ist to turn the rich but rather dry literature on 
the history of water in China into an acces- 
sible history. Why it is a secret one, however, 
remains a mystery. = 


Andrea Janku is senior lecturer in Chinese 
history at SOAS, University of London. She 
is currently working on a monograph on 
the history of famine, Integrating the Body 
Politic. 

e-mail: aj7@soas.ac.uk 
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China: change tack to 
boost basic research 


We agree that China must invest 
more in basic research, but fear 
that simply casting more seeds 

on infertile ground will not yield 
the anticipated fruit of innovation 
(W. Yang Nature 534, 467-469; 
2016). More bottom-up initiatives 
for early-career researchers are 
required for long-lasting change. 

A priority is to build capacity 
in critical thinking and self- 
determination, both of which are 
cornerstones of creative enquiry. 
Early-career scientists should 
be trained and judged on more 
than just technical competence. 
Furthermore, strategies are 
needed to give young researchers 
in China the same opportunities 
that leading Western institutions 
take for granted. Limited access 
to key information services and 
an educational emphasis on 
written knowledge over verbal 
communication skills do not 
foster scientific debate. 

Structural reform of funding 
silos and hierarchical power 
structures in science institutions 
is essential for cross-sector 
collaboration — a crucial 
contributor to scientific progress 
in Western countries. Chinese 
funding schemes such as the 
10,000 Talents programme could 
be redistributed across an evened- 
out power structure, and associate 
professors allowed to supervise 
PhDs and lead their own groups. 

The applied-research sector 
receives much more funding than 
basic science does (see Nature 534, 
452-453; 2016), so it could help by 
promoting the possible benefits 
of its work for basic research in 
reports and funding applications. 
Raphael K. Didham, Chao-Dong 
Zhu Institute of Zoology, Chinese 
Academy of Sciences, Beijing, China. 
zhucd@ioz.ac.cn 


China: standardize 
R&D costing 


Figures for China's basic-research 
spending should not be taken at 
face value (see W. Yang Nature 


534, 467-469; 2016). Before 
comparing gross expenditure 

on research and development 
(R&D) with that in developed 
countries, China's official 
statistics first need bringing into 
line with international standards 
for collecting and reporting 
R&D costs (see go.nature. 
com/2ab54rh). 

Unlike most other countries, 
China's government has no 
official system for assessing R&D 
expenditure. These costs are 
instead embedded in the overall 
costs for science and technology 
(see Y. Sun and C. Cao Science 
345, 1006-1008; 2014). And 
China's R&D statistics exclude 
salaries for university faculty 
members and postdocs — a 
significant component of R&D 
expenditure in other countries. 
Capital spending on big facilities 
and their operation budgets, 
known as fixed R&D costs in 
many Western countries, are 
not counted. For instance, 
only about US$18 million of 
the $200 million spent on the 
first construction phase of the 
Shanghai Synchrotron Radiation 
Facility was designated as 
an R&D cost (see go.nature. 
com/2alwvbc). 

Based on such considerations, 
China probably spent about 
double the official figure of 4.7% 
of its total R&D budget on basic 
research in 2013, as quoted by 
Yang. This is still significantly 
less than Japan or the United 
States. 

Yutao Sun Dalian University of 
Technology, China. 

Cong Cao The University of 
Nottingham Ningbo, China. 
sunyutao82@dlut.edu.cn 


Mentoring female 
scientists in Africa 


As founding members of the 
Higher Institute for Growth in 
Health Research (HIGHER) 

for Women in Cameroon, our 
mission has been to help young 
women to enter and sustain 
careers in biomedical science 
through a mentoring programme 
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(www.higherwomencam.org). 

Mentored female researchers 
spend more time on research 
and have more publications 
and greater career satisfaction 
than do their unmentored peers 
(W. Levinson et al. West J. Med. 
154, 423-426; 1991). In 
disadvantaged settings such as 
Cameroon, however, potential 
female mentors are in short 
supply. 

Led by one of us (R.G.E.L.) 
and funded by the Special 
Programme for Research and 
Training in Tropical Diseases 
and Canada’s International 
Development Research 
Centre, the HIGHER Women 
consortium has recruited more 
than 100 members in the past 
year. Its 20 or so mentors hold 
leading positions in Cameroon 
in academic institutions, 
research organizations or 
government agencies; each has 
four or five mentees on average. 

The consortium follows 
a holistic approach, taking 
into account the pressures on 
women in a traditional culture 
and encouraging career-life 
balance through planning and 
coordination. It is developing 
scientists’ skills in grant writing, 
leadership, ethics, research 
quality and time management. 
Rose G. F. Leke, S. Kwedi Nolna 
University of Yaoundé, Cameroon. 
roseleke@yahoo.com 


Refereed science to 
guide action on EDCs 


In a non-peer-reviewed venue 
(Nature 535, 355; 2016), Daniel 
Dietrich et al. put forward 
apparently unsubstantiated 
arguments that in effect dismiss 
thousands of peer-reviewed 
academic studies and rigorous 
evaluations of endocrine- 
disrupting chemicals (EDCs) 
by independent scientists 

and organizations such as the 
World Health Organization 
(WHO), the United Nations 
Environment Programme, 

the Endocrine Society and the 
International Federation of 


Gynecology and Obstetrics 
(see, for example, go.nature. 
com/2adgma2). 

Identification of hazards 
associated with EDCs relies 
on randomized mechanistic 
studies in animals and 
observational epidemiological 
studies in humans. Randomized 
trials of direct chemical 
exposures in humans present 
serious ethical and other 
challenges. Using approaches 
developed by the WHO 
and the Intergovernmental 
Panel on Climate Change 
to account for the totality of 
the laboratory and human 
evidence and assess the strength 
of the evidence, the costs of 
continued inaction on EDCs 
are estimated to be more than 
€150 billion (US$167 billion) 
annually (L. Trasande et al. 

J. Clin. Endocrinol. Metab. 100, 
1245-1255; 2015). 

Proactive prevention measures 
backed by strong scientific 
criteria are therefore needed to 
prevent disease and disability 
across the life course. Regulatory 
decision-making should rely 
on peer-reviewed research to 
evaluate EDCs (L. Trasande 
et al. J. Epidemiol. Comm. Health 
http://doi.org/bm5q; 2016). 
Many of us have productively 
worked to achieve scientific 
consensus for the proposed 
criteria for EDCs in Europe (see 
go.nature.com/2awvrsn). In our 
view, Dietrich et al. do nothing to 
advance the debate or scientific 
knowledge on this important 
human health issue. 

Leonardo Trasande NYU School 
of Medicine, New York City, USA. 
leonardo.trasande@nyumc.org 
Supported by 25 signatories (see 
go.nature.com/2aotwpi for full list). 


CORRECTION 

The print version of the 
Correspondence by 

M.W. Hayward et al. (Nature 
534, 475; 2016) incorrectly 
stated that 16 rhinos will be 
moved from Africa to Australia 
this year; in fact, it will be 6. 
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HEART DISEASE 


Death-defying plaque cells 


Dead cells are usually removed through their ingestion and destruction by other cells. A study of plaque deposits in arteries 
shows that dying cells in plaques display a ‘don’t-eat-me’ signal that blocks their removal. SEE LETTER P.86 


IRA TABAS 


eart attacks and strokes, which are 
Hew: causes of death worldwide’, 

begin with a process called athero- 
sclerosis, in which plaques — accumulations 
of lipids, cells, extracellular matrix and cellu- 
lar debris — occur in certain areas of arteries. 
Although most people’s arteries contain many 
such plaques, only a small percentage will cause 
disease”. On page 86, Kojima et al.’ provide a 
plausible mechanism that could explain why 
some plaques become clinically dangerous. 

A key feature of clinically dangerous 
(‘vulnerable’) plaques is a structure called the 
necrotic core, which contains dead cells that 
have undergone a type of cell death known as 
necrosis. The necrotic core is inflamed and has 
a thinning fibrous cap that covers the plaque 
and separates it from the central lumen of 
the artery’ (Fig. 1). When the cap ruptures or 
erodes, the necrotic material becomes exposed 
to circulating blood-cell fragments called 
platelets that are necessary for blood clotting. 
This exposure results in platelet aggregation 
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(thrombus), which may block the blood 
vessel and thereby cause a heart attack or stroke 
by depriving the heart or brain of oxygen. The 
necrotic core, which harbours inflammatory 
cellular debris, promotes cap disruption by 
contributing to the degradation of the cap’s 
structural protein, collagen, and by creating 
physical stress on the cap*. Understanding how 
the necrotic core develops is an urgent goal in 
heart-disease research. 

To determine how dying cells in plaques 
undergo necrosis, it is necessary to understand 
how the body normally prevents necrotic cell 
death. Billions of cells in the body die every 
day through a process called apoptosis, which 
initially prevents cell-membrane rupture and 
leakage of inflammatory cellular contents. 
Apoptotic cells are rapidly and safely removed 
by an evolutionarily conserved process called 
efferocytosis, in which the apoptotic cell is 
internalized and destroyed by an engulfing 
cell, called a phagocyte, before membrane 
rupture occurs. 

Efferocytosis requires signalling between 
the dying cell and the phagocyte: factors 


produced by the apoptotic cell promote the 
migration of phagocytes towards apoptotic 
cells, and ‘eat-me’ recognition markers on the 
surface of apoptotic cells are recognized by 
receptors on phagocytes’. Asa fail-safe mecha- 
nism, healthy living cells often express ‘don't- 
eat-me’ molecules on their cell surface that 
signal to block phagocytes from internalizing 
a live cell. The CD47 protein is an example of 
a don't-eat-me molecule that signals through 
the SIRPa receptor protein on phagocytes to 
inhibit apoptotic-cell engulfment”. 

What goes wrong in vulnerable plaques? 
Studies have shown that efferocytosis is defec- 
tive in ‘advanced’ human plaques that have not 
yet reached the vulnerable stage®, and experi- 
ments using genetically engineered mice* have 
demonstrated a causal relationship between 
defective efferocytosis and plaque necrosis. 
Thus, in advanced plaques, uncleared apop- 
totic cells eventually become leaky, resulting 
ina process called secondary necrosis. 

Why does efferocytosis become defective 
in advanced atherosclerosis? Kojima and col- 
leagues provide a plausible mechanism. They 


Figure 1 | Defective removal of dead cells can contribute to clinically 
dangerous atherosclerotic plaques. a, Many clinically dangerous plaques 
contain a structure called the necrotic core, characterized by inflammation 
and necrotic cell death. In atherosclerosis, if the fibrous cap covering the 
plaque ruptures or erodes, release of material from the necrotic core can 
trigger platelet aggregation (known as a thrombus) and arterial blockage, 
which may result in heart attack or stroke. Understanding how plaques 
develop to a necrotic state is a key question. b, Plaque cells undergo a 
non-inflammatory type of cell death called apoptosis. In asymptomatic 
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non-necrotic plaques, rapid removal of apoptotic cells by engulfing cells — 
a process known as efferocytosis — prevents necrosis. c, Kojima et al.° 
found that the inflammatory conditions of advanced atherosclerosis lead to 
persistent expression of the protein marker CD47 on plaque cells through 
the inflammatory-signalling mediator NF-«B. When these cells become 
apoptotic, CD47 sends a signal through the SIRPa receptor on the engulfing 
cell to block engulfment. The unengulfed cells undergo a type of cell death 
called secondary necrosis, leading to the release of inflammatory molecules 
and the formation of necrotic cores from the cell debris. 
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made the surprising finding that in histological 
sections from human and mouse plaques, 
unengulfed dying macrophage and vascular 
smooth muscle cells display the don't-eat- 
me signal CD47 on their surface. In a mouse 
model of atherosclerosis, the authors found 
that infusion of an antibody that blocks CD47 
improved efferocytosis in the plaque and less- 
ened formation of the necrotic core. On the 
basis of an in vitro model, they suggest that 
CD47 is transcriptionally induced by NF-«B, 
which orchestrates inflammatory programs in 
cells, including plaque cells. Defective phago- 
cytic clearance of cells that die by another 
mechanism — an enzyme-triggered necrotic 
process called primary necrosis — may also 
contribute to the formation of the necrotic 
core’, and here too the problem could involve 
abnormal expression of CD47 (ref. 8). 

The complex nature of both atherosclero- 
sis and efferocytosis suggests that multiple 
mechanisms cause defective efferocytosis as 
plaques progress’. Workers from Kojima and 
colleagues’ laboratory previously showed” 
that dead cells in the plaque show a deficit in 
expression of the eat-me signal calreticulin 
protein. Moreover, the MerTK receptor pro- 
tein present on phagocytic macrophages, 
which mediates efferocytosis in advanced 
plaques, undergoes degradation in the same 
type of inflammatory condition in athero- 
sclerosis that Kojima and colleagues suggest 
leads to expression of CD47. The protease 
enzyme ADAM 17 activates tumour necrosis 
factor-a (TNF-a), which induces CD47 in vas- 
cular smooth muscle cells, and ADAM17 also 
destroys MerTK"’. Both ADAM17 activation 
and cleavage of MerTK have been implicated 
in the progression of human plaques towards 
a clinically dangerous state’. 

How might our knowledge of defective 
efferocytosis in general, and the insights 
gained from the work of Kojima and col- 
leagues in particular, lead to future therapies 
to block the formation of dangerous plaques? 
Treatment with anti- TNF-a antibodies would 
block CD47 induction, and this strategy has 
been successful in debilitating autoimmune 
diseases for which TNF-a is a dominant trig- 
ger, notably rheumatoid arthritis. However, 
in atherosclerosis, it is probable that inflam- 
mation occurs through multiple pathways. 
Another concern is that anti- TNF-a treatment 
can compromise immune defences, which 
would challenge its long-term use as a preven- 
tive therapy in mostly asymptomatic people at 
risk of acute heart disease”. 

Treatment with anti-CD47 antibodies, 
which is being tested as a cancer treatment in 
early clinical trials’’, presents other challenges. 
CD47 is used by red blood cells to prevent their 
premature engulfment before cell senescence, 
and a major adverse effect of anti-CD47 ther- 
apy is anaemia” (a decrease in the number of 
red blood cells). Moreover, CD47 has roles in 
cell adhesion and migration, so its inhibition 


might cause adverse effects related to these 
functions in processes such as blood-vessel 
formation and immune defence. 

Another therapeutic strategy is based on the 
observation that many processes that generate 
vulnerable plaques, including inefficient effero- 
cytosis, can be caused by defects in a biological 
program known as resolution of inflamma- 
tion, which normally terminates an inflam- 
matory response when it is no longer needed, 
and initiates tissue repair. Administration of 
compounds that mediate this resolution pro- 
gram has proved beneficial in many preclinical 
models of resolution-defective diseases'*. For 
example, such treatment can improve efferocy- 
tosis and suppress plaque necrosis in advanced 
atherosclerosis'®. Moreover, resolution-media- 
tor therapy may actually boost host defence”’, 
and this approach is now being tested in early 
clinical trials targeting chronic inflammatory 
conditions”. These and other future develop- 
ments based on work such as that of Kojima and 
colleagues may some day provide a safe way to 
keep the plaques in our arteries from becoming 
clinically dangerous. = 
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Bacteria synchronized 
for drug delivery 


Asynthetic genetic circuit that mimics the quorum-sensing systems used by 
bacterial populations to coordinate gene expression enables bacteria to deliver 
drugs to mouse tumours in repeated and synchronized cycles. SEE LETTER P.81 


SHIBIN ZHOU 


umans and bacteria have a long 
H history of parasitic and symbiotic 

relationships. Now, Din et al.’ exploit 
a relationship between bacteria and diseased 
human tissue for a therapeutic purpose. On 
page 81, the authors outline a system in which 
engineered bacteria acting as drug-delivery 
vehicles simultaneously break down, releasing 
an antitumour drug in synchronized cycles to 
maximize delivery efficiency and minimize 
toxicity. 

In the body, some niches for bacteria — such 
as the anaerobic lumen of the intestines — have 
low oxygen levels. Similar conditions are found 
in solid tumours because of increased oxygen 
demand owing to highly proliferative tumour 
cells and insufficient blood supply owing to a 
structurally and functionally abnormal tumour 
vasculature’. The hypoxic areas in a tumour are 
relatively protected from attacks by the body's 


immune system, further facilitating bacterial 
colonization and growth’. 

The idea of using bacteria to fight cancer has 
been around for more than a century. In 1891, 
surgeon William B. Coley infected patients 
with Streptococcus bacteria in an attempt to 
activate the immune system to fight cancer’. 
The method was controversial because of 
inconsistent efficacy and the toxicity of strep- 
tococcal infection. But the idea resurfaced 
later, when more was known about the 
tumour microenvironment and genetic- 
engineering tools had emerged, raising 
the hope that more-potent and less-toxic 
(attenuated) bacterial strains could be 
generated. Several bacterial strains have now 
been developed as agents for cancer therapy 
and they are showing promising effects in 
experimental models’. 

Bacteria can destroy diseased tissue by 
competing for nutrients, secreting toxins 
and eliciting host immune responses. They 
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Figure 1 | Synchronized cyclical lysis and drug release. Din et al.’ constructed a genetic circuit in 
tumour-targeting bacteria that mediates drug production and release through cell lysis in synchronized, 
repeated cycles. a, Binding of the signalling molecule AHL to its receptor protein LuxR leads to their 
subsequent interaction with, and activation of, the promoter DNA sequence P,,,;. This promoter drives 
expression of the gene that encodes the enzyme LuxI, which catalyses AHL synthesis, thus generating 

a feed-forward loop. The promoter also drives expression of genes that encode a bacterial toxin (drug) 

to kill cancer cells, and protein E, which releases the drug through bacterial-cell lysis. AHL can diffuse 
freely in and out of cells (red arrows). When the density of the bacterial population is low, AHL primarily 
diffuses out of the cell and the circuit is not active. When population density increases, intracellular AHL 
accumulates, reaching a threshold concentration that activates the circuit in most cells. b, The cells lyse in 
synchrony, releasing a burst of the drug. The few bacteria that survive kick off another cycle. 


can also be genetically engineered to have 
extra antitumour activity. Compared with 
viruses, which have also been used in can- 
cer treatments, bacteria have substantially 
greater capacity to carry non-native DNA. It is 
routine practice in molecular biology to 
introduce DNA stretches of several kilobases 
into a bacterial host, and bacterial artificial 
chromosomes greater than 300 kb can be 
transferred to, and maintained in, Escheri- 
chia coli®. In principle, therefore, bacteria 
can serve as efficient drug-delivery vehicles, 
carrying genetic circuits that encode and 
regulate therapeutic payloads. 

An ideal drug-delivery system for cancer 
therapy should deliver the substance selec- 
tively to the tumour to minimize harm to 
healthy tissues, and should release the drug ina 
controlled manner. In their attempt to develop 
such a system, Din et al. focused on quo- 
rum-sensing circuits, which enable bacteria 
to communicate with one another, regulating 
gene expression in response to changes in 
population density. 

In a previous study’, the authors of the 
current work used a synthetic-biology 
approach to construct a quorum-sensing 
genetic circuit in E. coli. Three components 
— Lux, LuxR and acyl-homoserine lactone 
(AHL) — have crucial roles in this circuit. The 
enzyme LuxI catalyses synthesis of AHL mole- 
cules, and LuxR is an AHL receptor protein that 
activates a quorum-sensing transcriptional 
program. When bacterial population density 
is low, Lux] is expressed at a basal level. The 
AHL molecules synthesized as a result of LuxI 
expression do not accumulate in the cell, but 
instead rapidly diffuse out and become diluted 
in the extracellular environment. When popu- 
lation density rises, AHL accumulates in the 
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cell owing to the lowered diffusion gradient 
across the cell membrane. On reaching a 
threshold concentration in the cell, AHL mol- 
ecules bind to LuxR. In turn, LuxR activates 
a promoter DNA sequence called P,,,,, which 
drives expression of target genes. Notably, 
because AHL can diffuse across the cell mem- 
brane, it reaches similar concentrations in 
all bacterial cells in the growing population, 
ensuring synchronized execution of the gene- 
expression program. 

In the current study, Din et al. created a 
version of this genetic circuit that controlled 
synchronized and cyclical release of a bacte- 
rial toxin in attenuated Salmonella enterica 
serovar Typhimurium strains. In this system, 
Pizxy Promotes expression of genes encod- 
ing four components — LuxI; the drug; a 
fluorescent protein to enable monitoring of 
population dynamics and drug release; and 
protein E, a lysis protein from a bacterial 
virus called pX174 (Fig. 1). When the bacte- 
rial population reaches the critical density 
threshold, this P,,,,-driven transcriptional 
program is turned on in almost all cells, lead- 
ing to drug production and its subsequent 
release owing to breakdown of bacterial cells 
through lysis. A few outliers in the population 
survive and repopulate the niche. The result is 
periodic bacterial lysis and drug delivery. To 
demonstrate efficacy, Din and colleagues 
treated tumour-bearing mice with the engi- 
neered bacteria and showed that the bacteria 
exhibit synchronized cyclical population 
dynamics, and confer some therapeutic 
benefits either when administered alone or in 
combination with chemotherapy. 

The authors did not directly compare the 
efficacy of their bacteria with that of microbes 
engineered in the conventional way to 


continuously secrete the therapeutic protein. 
Regardless, the new bacteria are notably dif- 
ferent from the conventional ones. First, drug 
delivery is achieved through the simultane- 
ous lysis of the entire population, rather than 
through continuous secretion by proliferating 
individuals. Second, the periodic lysis serves as 
a safety mechanism, because keeping the bac- 
terial population to a defined size minimizes 
the risk of an adverse systemic inflammatory 
response that might harm the patient. 

Despite these features, bacteria alone 
(whether engineered or not), are unlikely to 
eradicate tumours?”. In the current study, 
treatment of mice with the engineered 
microbes in combination with chemotherapy 
did not destroy the tumour; instead tumours 
shrank for 18 days, after which regrowth 
occurred. A curative therapeutic approach 
would most likely involve further improve- 
ments to engineered bacteria or using bacteria 
in combination with immunotherapy or other, 
more-powerful anticancer agents. 

Cyclical drug release could be more 
useful for treating people who have diseases 
that require periodic dosing, such as diabetes 
and high blood pressure. To treat non-can- 
cerous diseases, perhaps natural niches could 
be targeted for cyclical bacterial colonization. 
Alternatively, an implantable, semipermeable 
cassette that can be traversed by proteins and 
small molecules but not by bacteria could be 
developed to host the engineered microbes. 
One challenge to using periodically lysing 
bacteria to treat non-cancerous diseases 
that require long-term treatment is that the 
bacterial-degradation by-products released 
in each lysis cycle might be absorbed into the 
blood and build up, causing toxic systemic 
effects. Choosing less-toxic bacterial strains 
or creating attenuated strains (for example, 
by deleting the msbB gene, which is involved 
in making endotoxin*) could overcome the 
problem. = 
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ATOMIC PHYSICS 


A milestone in 
quantum computing 


Quantum computers require many quantum bits to perform complex 
calculations, but devices with more than a few bits are difficult to program. 
A device based on five atomic quantum bits shows a way forward. SEE LETTER P.63 


STEPHEN D. BARTLETT 


uantum-savvy entrepreneurs are 

already bringing the first quantum 

computer processors out of the phys- 
ics laboratory and onto the market. But these 
devices are mostly designed to perform just 
one function and cannot be programmed to 
run different algorithms. It would therefore be 
advantageous to build a fully fledged quantum 
computer that could be programmed to run 
anything we might want. In particular, it might 
execute the complex quantum algorithms that 
researchers think will solve today’s intractable 
problems in quantum chemistry, materials 
science and data security. On page 63, Debnath 
et al.' present a small but fully programmable 
quantum computer consisting of five quantum 
bits (qubits), and they demonstrate its func- 
tionality by running several simple quantum 
algorithms. 

Debnath and colleagues’ computer is based 
on one of the oldest and most developed 
quantum architectures, which dates back to a 
design’ proposed by physicists Ignacio Cirac 
and Peter Zoller in 1995. In this design, the 
computer's qubits are individual atomic ions 
that are trapped in a line using magnetic 
fields and manipulated with lasers (Fig. 1). 
The trapped ions behave like a tiny crystal, 
and precisely controlled vibrations along this 
line can cause the ions to become ‘entangled. 
Entanglement is a key ingredient of quantum 
computing whereby two or more particles 
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share acommon state, such that each particle 
can no longer be described independently. 
Unlike most other quantum-computer archi- 
tectures, the operations used to entangle the 
particles are not restricted only to neighbour- 
ing qubits. 

Decades of research into precision metrol- 
ogy, such as the development of atomic clocks, 
now allow the quantum electronic states of 
trapped ions to be manipulated at an exqui- 
site level of control and stability. Debnath and 
collaborators took advantage of this work 
and have also made several improvements to 
Cirac and Zoller’s design, including the ability 
to target each ion (in this case, five ytterbium 
ions) individually with optical lasers. The net 
result is an elementary quantum processor in 
which every basic operation — initializing the 
states of the qubits, transforming them, entan- 
gling any pair of ions, and reading out the ions’ 
quantum state — can be performed with errors 
occurring less than 2% of the time. 

The accuracy of the authors’ quantum 
processor allowed them to develop preset 
quantum logic gates that enact a desired 
sequence of laser pulses to generate an ele- 
mentary component of a quantum circuit. In 
addition, they built a compiler that can take a 
quantum program — an algorithm designed 
to exploit some aspect of quantum mechanics 
to solve a mathematical problem — and deter- 
mine how to operate the hardware to run the 
program. 

The problems that can be solved by a small 
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Figure 1 | Design of an atomic quantum computer. Debnath et al.' have constructed a quantum 
computer that is based on an earlier design’. The computer’s quantum bits (qubits) are individual atomic 
ions that are trapped in a line using magnetic fields and electrodes. The ions are carefully manipulated 
by lasers so as to vibrate. The vibration of two ions allows them to become ‘entangled; enabling quantum 


computations to be performed. 
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computer with only five qubits are limited — 
they could be solved faster with even the most 
sluggish conventional laptop. But nonetheless, 
running simple algorithms can yield valuable 
information about the performance of the 
quantum processor as a whole, even when the 
outcome of the algorithm is already known. 
Why? A key concern for quantum architects 
is that qubits may seem to operate well when 
viewed individually, but can fail in unknown 
ways when required to work in tandem with 
many other qubits as part of a complex sys- 
tem. Simple algorithms are therefore used as a 
benchmark to see how several qubits function 
when combined in a larger circuit. 

Debnath et al. demonstrate several algo- 
rithms. These include the Deutsch-Jozsa° 
and Bernstein-Vazirani‘’ algorithms, which 
both use quantum effects to perform a math- 
ematical calculation in a single step, whereas a 
conventional computer would require several 
operations. They also demonstrate a quantum 
Fourier transform*®, which isa key component 
of many of the heftier quantum algorithms, 
such as those used to break encryption. In all of 
these demonstrations, the resulting error rate 
is consistent with the authors’ observations of 
how their qubits work in isolation, showing 
that the qubits can be used together in more- 
sophisticated algorithms in the future. 

There is still a long way to go before quan- 
tum computers can reach their full potential. 
For the trapped-ion architecture explored 
here, researchers have already hit the limit of 
the number of ions that can be placed in a line 
ina single trap — around a dozen’. The future 
of this field is believed to involve either joining 
many such traps together using optical quan- 
tum couplers, or shuttling ions between inter- 
action zones in microfabricated traps that have 
a 2D layout’. The latter approach also offers 
the tantalizing possibility of low error rates for 
basic logic operations, perhaps even just one 
error in every thousand operations — a fig- 
ure commonly thought to be the highest error 
rate that a large-scale quantum computer could 
tolerate. Research in these directions has been 
encouraging, but it may be a while before these 
scalable approaches can reproduce even the 
five-qubit results demonstrated by Debnath 
and colleagues’ quantum computer. 

Trapped-ion quantum architectures are not 
the only game in town. A range of other solid- 
state, atomic and optical quantum systems 
each have different advantages for quantum 
computing. Notably, an approach using qubits 
built of superconducting circuitry — con- 
sidered the dark horse of quantum comput- 
ing research only a decade ago — has shown 
enormous recent success”"’. Not only can 
superconducting technologies now compete 
with the phenomenal precision that has been 
shown with trapped ions, but they can also 
operate at much higher speeds and may have a 
clearer pathway to being scaled up. 

A programmable five-qubit quantum 
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computer built using superconducting 
circuits has now also been demonstrated", 
and has similar capabilities to Debnath and 
colleagues’ device. Both the superconducting- 
circuit and ion-trap approaches seem to be 
capable of being scaled up to larger devices that 
have more quantum bits. The next challenge 
for all of these technologies is to demonstrate 
that quantum error correction can bring error 
rates down to negligible levels. m 
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Protection for 
anaesthetized mice 


A cognition- enhancing drug called CX546 prevents the neurodegenerative 
effects of repeated anaesthesia in infant mice by promoting neuronal changes 
associated with learning and by protecting neurons from death. 


LAURA CORNELISSEN & CHARLES BERDE 


illions of children have surgery 
Mi general anaesthesia each 

year. Studies of infant animals show 
that neurodegeneration and long-term neuro- 
behavioural impairments arise when general 
anaesthesia is used at crucial periods of brain 
development, especially following high doses 
and prolonged exposures to anaesthetics’. 
Studies in humans have been controversial — 
some have reinforced these findings, particu- 
larly among infants who have had multiple 
anaesthetics and surgeries’, whereas a case— 
control study’ and an interim analysis* of a 
recent randomized trial have been more reas- 
suring. Writing in Science Translational Medi- 
cine, Huang et al.” report that a drug called 
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CX546 confers neuroprotection in infant mice 
that are repeatedly exposed to the anaesthetic 
molecule ketamine. 

Ketamine is thought to act predominantly 
by blocking signalling through NMDA recep- 
tor proteins’, which are activated by the 
excitatory neurotransmitter molecule glu- 
tamate. The authors examined the effects of 
ketamine anaesthesia on neuronal activity 
in the brains of infant mice. In vivo imaging 
experiments revealed that neuronal activity 
decreased during post-anaesthesia recovery 
in treated animals compared to untreated 
animals. Analysis of proteins at the syn- 
aptic junctions between neurons showed 
that, after repeated anaesthesia, expression 
of NMDA receptors was reduced in adult- 
hood, as was expression of another class of 
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Figure 1 | Combating ketamine. a, In infant mice, repeated exposure to anaesthesia using the molecule 
ketamine inhibits neuronal signalling mediated by AMPA and NMDA receptor proteins, and prevents 
remodelling of tiny neuronal structures called dendritic spines (dashed line indicates a spine 

whose remodelling was prevented). This causes neuronal death and weakening of neural circuits, leading 
to defects in learning and motor performance. b, Huang et al.’ report that a drug called CX546, which 
increases AMPA and NMDA expression and so promotes excitatory neurotransmission, can rescue these 


defects, preserving normal neurodevelopment. 
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glutamate receptor proteins, AMPA receptors. 

CX546 is part of a group of cognition- 
enhancing drugs called AMPAkines that assist 
excitatory neurotransmission through AMPA 
receptors. Huang and colleagues found that 
CX546 prevented ketamine-induced death 
of brain neurons in infant mice, restored the 
expression of AMPA and NMDA receptors, 
and preserved neuronal activity in vulnerable 
brain regions. Moreover, the drug improved 
neurobehavioural outcomes, for example by 
rescuing the learning deficits that are asso- 
ciated with repeated ketamine anaesthesia 
(Fig. 1). Finally, the authors showed that 
CX546 partially rescued the remodelling of 
dendritic spines — tiny neuronal structures 
whose formation and elimination are crucial 
for processes such as learning. These struc- 
tures cannot be correctly remodelled following 
repeated ketamine anaesthesia”. 

Glutamate is the most prominent excitatory 
neurotransmitter in the central nervous sys- 
tem, and plays a crucial part in the processes 
of neural-circuit strengthening (through sus- 
tained activity) and elimination (through weak 
activity). CX546 enhances glutamate-mediated 
neurotransmission and strengthens circuits by 
increasing neuronal activity. This is probably 
how the drug provides neuroprotection when 
it is given immediately after the periods of low 
neuronal activity that follow repeated keta- 
mine anaesthesia. 

Several AMPAkines are already in clinical 
trials in adults as treatments for a range of con- 
ditions, including Parkinson's disease, schizo- 
phrenia and autism’. Huang et al. speculate 
that CX546 might hold promise as a therapy 
to prevent neuronal defects in human infants 
undergoing surgery and anaesthesia. 

Widely varying recommendations and 
mitigation strategies have been proposed in 
response to the debate around anaesthetic- 
induced neurotoxicity in human infants”®. In 
2012, a public-private collaboration between 
the US Food and Drug Administration and 
the International Anesthesia Research Soci- 
ety, called SmartTots, recommended delaying 
elective surgery that uses general anaesthesia 
until patients are at least three years old when- 
ever possible'’. Subsequently, these authorities 
amended their recommendations to advo- 
cate balancing the risks and benefits to guide 
individual treatment decisions”. 

For many types of paediatric surgery, such 
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a delay is either not feasible or would probably 
do more harm than good. For instance, 
delayed repair of congenital heart defects could 
cause death or neurological deficits — a much 
greater risk than the potential consequences 
of exposure to a general anaesthetic. Similarly, 
many head and neck procedures performed 
during infancy and early childhood foster 
optimal neurodevelopment by correcting 
impairments in hearing, vision, speech or 
feeding, or by removing airway obstructions. 
In these cases, avoiding general anaesthesia is 
simply not practical. However, local or regional 
anaesthesia can sometimes be used to reduce 
dose requirements for general anaesthetics, to 
provide postoperative pain relief, or, in selected 
cases, to act as a primary anaesthetic’. 

Alternative anaesthetic agents such as 
dexmedetomidine and xenon are currently 
under investigation for use in the clinic, on 
the basis of animal data’’ suggesting that they 
cause less neurotoxicity than ketamine or the 
widely used inhalation anaesthetics sevoflu- 
rane and isoflurane. Dexmedetomidine and 
xenon are not complete anaesthetics by them- 
selves at clinically achievable doses, and there 
are some practical barriers to implementation. 
Nonetheless, both hold some promise for 
drastically reducing the doses of conventional 
anaesthetics that are required to maintain a 
general anaesthetic state. 

Several questions should be addressed 
before testing CX546 in clinical trials. First, 
is the neuroprotective action of CX546 spe- 
cific for ketamine, or can the drug also pro- 
tect against anaesthetics that act on different 
neuronal circuits? Second, how crucial is the 
timing of CX546 administration for its neuro- 
protective effect? Huang et al. gave CX546 to 
infant mice after anaesthesia, but delivering it 
during anaesthesia and surgery might change 
the dose requirements. 

Third, because AMPAkines have been 
shown to stimulate respiration”, it is possible 
that the neuroprotective benefit of CX546 is 
partly due to respiratory stimulation. This in 
turn remedies the low levels of oxygen and 
high levels of carbon dioxide in the blood that 
could be induced by ketamine and other anaes- 
thetics. Huang et al. did not analyse respiration 
during or after anaesthesia, so this remains to 
be investigated. Finally, the potential adverse 
effects of exposing brains to CX546 during 
crucial periods of their development must be 
assessed. 

Preventive medicines have previously 
caused considerable harm”. For a medication 
to be beneficial in a preventive role, the average 
number of patients that can be treated before 
one person is harmed (the ‘number needed to 
harm’) must be extremely high, and the aver- 
age number of patients who must be treated 
before one extra person benefits compared to 
the previous regime (the ‘number needed to 
treat’) must be comparatively low. Clinical-trial 
designs for a CX546 neuroprotection study in 


infants undergoing surgery would face serious 
challenges’®, and would require an extensive 
preclinical toxicology programme that tested 
infant animals from several species. 

From the standpoints of ethics, risk-benefit 
and effect size, it might be appropriate to con- 
duct a clinical trial among infants who are 
already having repeated or prolonged surgical 
procedures or who require long-term seda- 
tion. These infants have complex and varied 
medical conditions and a range of confound- 
ing factors that would make a prevention 
trial difficult, but not impossible. It would be 
unethical not to use a trial design that is ran- 
domized, double blind, prospective (one that 
studies subjects after enrolment, rather than 
retrospectively) and controlled'*"®. 

Huang et al. are to be commended for an 
innovative study, which introduces a plausible, 
mechanism-based potential preventive treat- 
ment for anaesthetic-induced neurotoxicity in 
infants. Their work adds to our understanding 
of the mechanisms that underpin ketamine’s 
activity, its effects and the potential interven- 
tions that could optimize neurodevelopment in 
infants undergoing surgery and anaesthesia. m 
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Still a geneticist’s 


nightmare 


The largest DNA-sequencing study of type 2 diabetes conducted so far concludes 
that, contrary to expectation, low-frequency and rare genetic variants do not 
contribute significantly to disease risk. SEE ARTICLE P.41 


STEPHEN S. RICH 


ype 2 diabetes is a major cause of 
illness and death, particularly in people 

of African, Hispanic and Asian ances- 

try. Despite the indications of strong familial 
origins’, it was not until 2012 that genetic vari- 
ants associated with the disease were robustly 
established, thanks to a genome-wide asso- 
ciation study (GWAS) that looked for common 
risk variants in more than 100,000 people from 
several ancestral populations’. Nonetheless, 
only around 10% of the risk attributable to 
genetic factors has been identified. An obvious 
next approach is to interrogate the genome for 
infrequent and rare variants that could affect 
risk individually or in aggregate. On page 41, 
Puchsberger et al.’ present a comprehensive 
evaluation of the role of rare and infrequent 
variants in the risk of developing type 2 diabetes. 
The authors sequenced exomes, which 


encompass protein-coding regions (about 
1-2% of the human genome), from around 
6,500 people with type 2 diabetes and 
6,400 healthy controls, from 5 ancestry 
groups. They also sequenced whole genomes 
from some 1,300 people with the disease 
and 1,300 ancestry-matched controls. After 
rigorous quality control, the whole-genome 
sequences revealed approximately 27 mil- 
lion sequences or bases that varied between 
individuals. 

Fuchsberger and colleagues found that 126 of 
these variants, each located in one of four genes, 
were significantly associated with an altered risk 
of type 2 diabetes. Two of the genes, TCF7L2 
and ADCY5, had been previously identified as 
containing commonly occurring variants asso- 
ciated with diabetes risk, and a risk-associated 
variant in a third, CCND2, had been found to 
occur at low frequency’. The final gene, EML4, 
contained a common variant that had not been 
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identified from the GWAS — but this discovery 
was not replicated in a larger data set. 

Crucially, there was no significant evidence 
for rare or low-frequency disease-associated 
variants in regulatory elements, which modu- 
late gene expression, or in coding sequences. 
Combining samples that had undergone 
both exome and whole-genome sequencing 
revealed only one more risk variant, in the 
gene PAX4; this variant had been previously 
detected in people from East Asia*®. 

To broaden their search, the authors 
increased the number of subjects to around 
90,000. They analysed the participants’: DNA 
using a customized array — a tool that allowed 
the analysis of specific coding sequences in 
which diabetes-associated variation might 
arise, thereby generating fewer raw data than 
genome sequencing. In this expanded data 
set, they found another 18 common variants 
in 13 genes. Only one of these, in MTMR3, had 
not previously been identified by the GWAS. 
Fuchsberger and colleagues conclude that 
there is little evidence that rare or infrequent 
variants affect the risk of developing diabetes. 
Instead, the authors suggest that almost all 
significantly diabetes-associated variants are 
common in the population and have been pre- 
viously detected by the GWAS. 

If the contribution of common variants to 
the genetic risk of diabetes is relatively limited, 
and if there is little support for a contribution 
from rare and low-frequency variants, where 
are the culprits hiding? Once dubbed “a geneti- 
cist’s nightmare””’, diabetes seems to be living 
up to its reputation. What are researchers miss- 
ing? Could other genetic tools be applied? 

One approach to improving the search would 
be to add more whole-genome sequences, in 
the hope that a larger population will enable 
smaller effects to be detected. Size does matter 
in genomic studies, but the current work sug- 
gests that increased sample size will not much 
increase the number of risk-associated genes 
or variants. However, the cost of increasing 
sample size might be surprisingly low. Several 
initiatives from the US National Institutes of 
Health, including the Trans-Omics for Precision 
Medicine program, the Centers for Common 
Disease Genomics project and the Personalized 
Medicine Initiative, will provide whole-genome 
sequences and associated data on phenotypes 
(the severity of a range of diabetes-associated 
traits) free of charge. Even if the expected yield 
of such analyses is low, finding a handful of rare 
variants — for example, those that confer loss 
of gene function, such as one found in PCSK9 
(ref. 8) — could have a major impact. 

There are many alternatives to increasing 
sample size. These include focusing only on the 
variants that confer an increased risk of harm- 
ful phenotypes’, analysing diverse populations 
or those known to be at low risk of disease’®, 
and considering other genetic ‘architectures, 
such as whether the disease is caused by a mod- 
est contribution from many common variants, 
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rather than a large contribution from a few rare 
ones. Moreover, genetic risk should be consid- 
ered in the context of the complex environ- 
mental risk factors with which it is inexorably 
intertwined. For example, being overweight 
does not always lead to diabetes, but fatty tissue 
increases insulin resistance, and abdominal 
fat increases the risk of diabetes more than 
does fat in hips and thighs. Physical activity not 
only controls weight, but also uses glucose as 
energy and increases cellular insulin sensitiv- 
ity. An understanding of genetic risk factors is 
needed to elucidate the mechanisms that lead 
to such complex, variable phenotypes. 

Many genes and variants, both common 
and rare, could influence the declining func- 
tion of the B-cells that store and release insulin. 
However, relatively small sets of variants might 
be sufficient to elevate risk in the context of 
other genetic or environmental factors. There 
might be cassettes of variants unique to an 
individual — called private variants — that 
lead to diabetes only in certain conditions. In 
this scenario, the large case-control design is 
not necessarily optimal, because the averag- 
ing of effects will obscure crucial gene sets. 
Family-based designs” and efforts to further 
dissect the subtleties of each phenotype might 
be required to identify private variants’. 

Historically, to obtain a grasp of risk-asso- 
ciated variants, genetic studies took advan- 
tage of extremes of disease presentation (such 
as mild or severe phenotypes, or early or late 
diagnosis”), or of the occurrence of disease in 
low-risk groups, or of populations in which 
private variants might contribute to disease 
risk. Although such studies have already been 
performed, it might be time to revisit these 
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approaches using whole-genome-sequence 
and refined phenotypic data. 

The authors’ study marks the first large-scale 
use of whole-genome sequencing data to tackle 
this incredibly complex disease. The conclu- 
sion that rare and infrequent variants have 
little effect on risk under this study design and 
in these populations is important. However, it 
may be that Fuchsberger et al. have eliminated 
only the rare variants identifiable from a case- 
control study. The genetics of type 2 diabetes 
might still be a nightmare, but nonetheless the 
search continues. m 
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Destruction of 
discrete charge 


Electric charge is quantized in units of the electron’s charge. An experiment explores 
the suppression of charge quantization caused by quantum fluctuations and 
supports a long-standing theory that explains this behaviour. SEE LETTER P.58 


YULI V. NAZAROV 


ecause matter is constructed of 
B elementary particles, the electric charge 

of any object is an integer multiple of the 
elementary charge, which is equivalent in size 
to the electrons charge. This concept is known 
as charge quantization. In 1913, charge quan- 
tization (and thus, the existence of elementary 
particles) was demonstrated by the physicist 
Robert Millikan, who measured the charges of 
single electrons in oil drops containing many 


billions of particles'. However, quantum physics 
predicts that charge quantization can be 
destroyed by tiny quantum fluctuations. On 
page 58, Jezouin et al.’ describe an experiment 
to control these fluctuations. Their results sug- 
gest that the effect of the fluctuations on charge 
quantization can be explained through particle- 
like phenomena called Korshunov instantons’. 

In the current age of nanoscience, charge 
quantization has enabled the manipula- 
tion of single electrons in nanostructures, 
with applications in metrology, sensing and 
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Figure 1 | A nanodevice to investigate charge quantization. Jezouin et al.’ have studied the effect 

of quantum fluctuations on the quantization of electric charge. Their device consists of a micrometre- 
scale metallic island, which stores a discrete charge. The device is built on a semiconducting material, 
the surface of which supports a thin layer of electron gas (green). When a voltage is applied to two 

pairs of metallic ‘gate’ electrodes, the electrons of the gas are repelled, leading to regions in which 

the gas is depleted (grey) and allowing current to flow through the device (red arrows). The island is 
connected to two other electrodes (the left and right electrodes) through separate junctions. Electrons 
can travel through the junctions because of quantum tunnelling (dotted lines) with a probability that is 
controlled using the gate electrodes. Charge quantization is observed as Coulomb oscillations, a periodic 
dependence of the device’s conductance on the voltage of one of the gate electrodes. The authors have 
measured the visibility of these oscillations against the voltage-independent component of conductance 
as the transmission probabilities of the junctions and the temperature are varied. 


thermometry’. These nanostructures consist 
of conducting, metallic islands that store the 
single charges, and which are analogues of 
Millikan’s oil drops. The islands are connected 
to electrodes and to each other by tunnel 
junctions — barriers between two conducting 
materials, such as thin layers of an insulator. 
Other ‘gate’ electrodes are used to control the 
discrete charges on the islands. 

The most common manifestation of charge 
quantization in such nanostructures is the 
presence of Coulomb oscillations’: a periodic 
dependence of the structure’s conductance on 
the voltage of one of the gate electrodes. How- 
ever, if the islands are not well isolated, the con- 
ductance of the connecting junctions produces 
quantum fluctuations of charge that suppress 
the quantization, reducing the visibility of 
the Coulomb oscillations against the voltage- 
independent component of conductance. 

The aim of Jezouin and colleagues’ experi- 
ment’ is to fully control these quantum fluc- 
tuations. Their set-up is remarkably simple: 
a single metallic island is connected to two 
electrodes through separate junctions (Fig. 1). 
Unlike in previous attempts to control the 
fluctuations (see, for example, refs 5-7), each 
junction has only one conduction channel (a 
pathway through which an electron can travel). 
Therefore, the conduction of each junction is 
directly related to its transmission probability, 
the probability that an electron will be trans- 
mitted through the junction. 

The authors are able to vary and quantify 
the transmission probability of each junction. 
To achieve this level of control, the conduc- 
tion channels are formed in a semiconductor 


material, the surface of which supports a thin 
layer of electron gas. The authors’ technologi- 
cal advance is to connect this two-dimensional 
gas to the metallic island. 

Using this set-up, the authors measure the 
visibility of the Coulomb oscillations as two 
quantities — the transmission probabilities of 
the two junctions and the temperature — are 
varied. They find that the visibility of the oscilla- 
tions is reduced if either of the junctions’ trans- 
mission probabilities is increased. Furthermore, 
as the temperature is increased, thermal fluctu- 
ations result in an exponential suppression of 
the oscillations. What is the significance of these 
two ‘scaling laws’? 

Unlike classical fluctuations, which tend to 
destroy the order of a system, quantum fluc- 
tuations can actually generate alternative order. 
For example, large fluctuations in the position 
of a quantum particle imply that the particle's 
momentum has a precise value, whereas large 
fluctuations of momentum imply a precise 
position. Electric charge and magnetic flux 
are similarly connected in superconducting 
nanostructures’, so that increases in charge 
fluctuation convert charge quantization to flux 
quantization. 

States of quantized flux do not exist in 
normal, non-superconducting metals. How- 
ever, in 1987, the physicist Sergey Korshunov 
unexpectedly discovered that two flux quanta 
can be transferred between non-quantized 
flux states without the requirement for super- 
conductivity, owing to the presence of parti- 
cle-like phenomena called instantons’. Since 
then, the manifestations of these Korshunov 
instantons have been thoroughly investigated. 


NEWS & VIEWS | RESEARCH | 


In 1999, the concept was extended to arbitrary 
transmission probabilities’, leading to two 
specific predictions that describe the influence 
of instantons on charge quantization. 

These predictions are precisely the scaling 
laws observed by Jezouin and collaborators. 
The authors’ results therefore strongly suggest 
that charge quantization over the whole range 
of fluctuation strength is governed by Kors- 
hunov instantons, and provide experimental 
evidence of these long-predicted ‘particles: 

The Korshunov instanton is a close relative 
of another concealed ‘particle; the leviton””. 
Levitons are generated when a carefully engi- 
neered voltage pulse, encompassing precisely 
two flux quanta, is applied to a collection of 
electrons in a nanojunction. This phenom- 
enon was confirmed" experimentally in 2013. 
Whereas the leviton is an excitation of elec- 
tron systems, the instanton is a property of the 
electronic ground state that, in the presence of 
electrostatic (Coulomb) interactions, governs 
charge quantization. 

One puzzle left by the authors’ experiment 
is why the scaling laws are consistent with the 
theoretical predictions for a single instan- 
ton. Theoretical considerations suggest that 
Jezouin and colleagues’ system would involve 
configurations of many instantons, which 
would lead to more-complex scaling laws 
than those observed. Their results may imply 
that the theory that underlies such systems is 
simpler than it seems and has unknown sym- 
metries that would cancel out many-instanton 
configurations. These theories could be tested 
using more-detailed measurements, espe- 
cially of the island capacitance. One could also 
construct more-sophisticated set-ups that 
combine levitons and instantons. Greater 
control of these concealed particles could then 
be achieved by studying their interference. 

Charge quantization is a simple concept; 
however, observing the effect of quantum 
fluctuations can lead to fascinating discover- 
ies. Chasing exotic particles in nanostructures 
will help us to understand the complexity of 
quantum laws of electricity, with the hope 
of eventually applying this knowledge to 
quantum information processing. = 
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The genetic architecture of type 2 diabetes 


A list of authors and affiliations appears in the online version of the paper 


The genetic architecture of common traits, including the number, frequency, and effect sizes of inherited variants that 
contribute to individual risk, has been long debated. Genome-wide association studies have identified scores of common 
variants associated with type 2 diabetes, but in aggregate, these explain only a fraction of the heritability of this disease. 
Here, to test the hypothesis that lower-frequency variants explain much of the remainder, the GoT2D and T2D-GENES 
consortia performed whole-genome sequencing in 2,657 European individuals with and without diabetes, and exome 
sequencing in 12,940 individuals from five ancestry groups. To increase statistical power, we expanded the sample size 
via genotyping and imputation in a further 111,548 subjects. Variants associated with type 2 diabetes after sequencing 
were overwhelmingly common and most fell within regions previously identified by genome-wide association studies. 
Comprehensive enumeration of sequence variation is necessary to identify functional alleles that provide important clues 
to disease pathophysiology, but large-scale sequencing does not support the idea that lower-frequency variants have a 


major role in predisposition to type 2 diabetes. 


There is compelling evidence that the individual risk of type 2 diabetes 
(T2D) is strongly influenced by genetic factors’. Progress in charac- 
terizing the specific T2D-risk alleles responsible has been catalysed by 
the ability to perform genome-wide association studies (GWAS). Over 
the past decade, successive waves of T2D GWAS—featuring ever larger 
samples, progressively denser genotyping arrays supplemented by 
imputation against more complete reference panels, and richer ethnic 
diversity—have delivered more than 80 robust association signals” *. 
However, in these studies, the alleles interrogated for association were 
predominantly common (minor allele frequency (MAF) >5%), and 
with limited exceptions””, the variants driving known association 
signals were also common, with individually modest impacts on T2D 
risk? *°. Variation at known loci explains only a minority of observed 
T2D heritability??"". 

Residual genetic variance is partly explained by a long tail of com- 
mon variant signals of lesser effect”. However, the contribution to T2D 
risk that is attributable to lower-frequency variants remains a matter 
of considerable debate, not least because of the relevance of disease 
architecture to clinical application!!. 

Next-generation sequencing enables direct evaluation of the role of 
lower-frequency variants to disease risk”'*'. This paper describes the 
efforts of the coordinated, complementary strategies pursued by the 
Genetics of Type 2 Diabetes (GoT2D) and Type 2 Diabetes Genetic 
Exploration by Next-generation sequencing in multi-Ethnic Samples 
(T2D-GENES) consortia. GoT2D collected comprehensive genome- 
wide sequence data from 2,657 T2D cases and controls; T2D-GENES 
focused on exome sequence variation, assembling data (after inclu- 
sion of GoT2D exomes) from a multiethnic sample of 12,940 indi- 
viduals. Both consortia used genotype data to expand the sample size 
available for association testing for a subset of the variants exposed by 
sequencing. 


Analysis of genome-wide variation 

The GoT2D consortium selected for whole-genome sequencing 
cases of type 2 diabetes (T2D) and ancestry-matched normogly- 
caemic controls from northern and central Europe (Methods and 
Supplementary Table 1). To increase power to identify low-frequency 
(0.5% < MAF <5%) and rare (MAF < 0.5%) T2D variants with large 
effects, we preferentially identified individuals from the extremes of 
genetic risk (Methods). The genome sequence of 1,326 cases and 1,331 
control individuals was determined through joint statistical analy- 
sis of low-coverage whole-genome sequence (~5 x), deep-coverage 


exome sequence (~82 x), and array-based genotypes at 2.5 million 
single nucleotide variants (SNVs) (Extended Data Fig. 1 and Extended 
Data Table 1). 

We detected, genotyped, and estimated haplotype phase for 
26.7 million genetic variants (Extended Data Fig. 1 and Extended 
Data Table 2), including 1.5 million short insertion-deletion variants 
(indels) and 8,876 large deletions. Individual diploid genomes carried 
a mean of 3.30 million variants (range: 3.20 million—3.35 million), 
including 271,245 indels (262,201-327,077), and 669 (579-747) large 
deletions. These data include many variants not directly studied by 
previous GWAS, including all of the indels as well as 420,473 com- 
mon and 2.4 million low-frequency SNVs that were poorly tagged 
(1? <0.30)** by genotype arrays. We estimate near-complete ascer- 
tainment (98.2%) of SNVs with minor allele counts of greater than 5 
(MAF > 0.1%), and high accuracy (over 99.1%) at heterozygous 
genotypes (Methods and Fig. 1a). As half of the sequenced individ- 
uals were T2D cases, ascertainment was enhanced for any rare or 
low-frequency variants that substantially increase T2D risk (Fig. 1a). 
Specifically, we estimate >80% power to detect (at genome-wide 
significance, a=5 x 108) T2D risk variants with MAF >5% and 
odds ratio (OR) > 1.87, or MAF > 0.5% and OR > 4.70 (Extended Data 
Fig. 2). 

We tested all 26.7 million variants for T2D association by logistic 
regression assuming an additive genetic model (Supplementary Table 2). 
Analyses using a mixed-model framework to account for popula- 
tion structure and relatedness generated almost identical results. 
At genome-wide significance, 126 variants at four loci were associ- 
ated with T2D (Fig. 1b). These included two previously reported 
common-variant loci (TCF7L2 and ADCY5), a previously reported 
low-frequency variant in CCND2 (ref. 7) (1876895963, MAF = 2.6%, 
Preq= 4.2 X 10°), and a novel common-variant association near EML4 
(MAF = 34.8%, Pseq= 1.0 x 10°). There was no significant evidence of 
association with T2D for sets of low-frequency or rare variants within 
coding regions, nor within specified non-coding regulatory elements 
(Methods). 

Power to detect association with low-frequency and rare variants of 
modest effect is limited in a sample of 2,657 individuals. To increase 
power for variants discovered via genome sequencing, we imputed 
sequence-based genotypes into 44,414 additional individuals of 
European origin (11,645 T2D cases and 32,769 controls; Methods) 
from 13 studies (Supplementary Table 3). We estimated power in 
the combined sequence plus imputed data, adjusting for imputation 
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Figure 1 | Ascertainment of variants and single-variant results. 

a, Sensitivity of low-coverage genome sequence data to detect SNVs in 
the deep exome sequence data, relative to other variant catalogues. 
Points represent results for a specific minor allele count. All results 
assume odds ratio (OR) = 1 for all variants, unless stated otherwise. 


quality, to be >80% for variants with MAF > 5% and OR > 1.23, or 
MAF > 0.5% and OR > 1.92 (Extended Data Fig. 2). A meta-analysis 
combining results for the sequence and imputed data identified 674 
variants across 14 loci associated with T2D at genome-wide significance 
(Fig. 1c). We observed a previously undescribed association with 
a common variant near CENPW (rs11759026, MAF = 23.2%, 
Pmeta=3.5 x 107°; Fig. 1c) and replicated this association in an addi- 
tional 14,201 cases and 100,964 controls from the DIAGRAM con- 
sortium (P= 2.5 x 1074; P-ombined = 1.1 x 1071; Methods). The EML4 
signal detected in the sequence data was not replicated in the imputed 
data (P=0.59; Pmeta = 0.26; Fig. 1c). 

To test for additional association signals, we performed con- 
ditional analysis at loci previously associated with risk of T2D 
(Methods). We identified two previously unreported association 
signals, both involving low-frequency variants, at a corrected signif- 
icance threshold (a < 1.8 x 107°; Methods): one at the IRS1 locus 
(1878124264, MAF = 2.2%, Pronditional = 2-5 X 10~7) and one upstream of 
PPARG (1879856023, MAF =2.2%, Pronditional = 9-2 X 1077) (Extended 
Data Table 3). The PPARG signal overlaps regulatory elements in hASC 
pre-adipose and HepG2 cells, consistent with evidence that altered 
adipose regulation drives the primary PPARG signal!" 


Analysis of coding variation 

The T2D-GENES consortium adopted a complementary strategy, 
focused on variants in protein-coding sequence, and seeking to improve 
power to detect rare-variant association by exploiting the more robust 
functional annotation of coding variation and the potential to aggregate 
multiple alleles of presumed similar impact in the same gene!*!°. We 
combined exome sequence data from 10,437 T2D cases and controls 
of diverse ancestry generated by T2D-GENES with the equivalent data 
from GoT2D. This created a joint data set (after all quality control) 
comprising 12,940 individuals (6,504 cases and 6,436 controls) drawn 
from five ancestry groups: 4,541 of European origin, and around 
2,000 (range: 1,943-2,217) each of South Asian, East Asian, Hispanic 
and African American origin (Extended Data Fig. 1, Extended Data 
Table 1 and Supplementary Table 4). Mean coverage was 82 x across 
the coding sequence of 18,281 genes, identifying 3.04 million variants 
(1.19 million protein-altering; Supplementary Figs 5, 6). Each diploid 
genome carried a mean of 9,243 (range: 8,423-11,487) synonymous, 
7,636 (6,935-9,271) missense, and 250 (183-358) protein-truncating 
alleles (Supplementary Table 7). 

We tested for T2D association within the five ancestry groups, 
assuming an additive genetic model, using mixed-model approaches 
that account for population structure and relatedness!°, and com- 
bined ancestry-specific results by trans-ethnic meta-analysis 
(Methods). We estimate >80% power to detect (at genome-wide 
significance) T2D risk variants with MAF > 5% and OR > 1.36, or 
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b, c, Manhattan plots of single-variant association analyses for: sequence 
data alone (b, 1,326 cases and 1,331 controls) and meta-analysis of 
sequence and imputed data (c, total of 14,297 cases and 32,774 controls). 
1000G, the 1000 Genomes Project data. 


MAF > 0.5% and OR> 2.29 (Methods and Extended Data Fig. 2). 
Only one variant reached genome-wide significance (PAX4 
Arg192His, rs2233580, P=9.3 x 10°) (Table 1, Extended Data 
Figs 3, 4 and Supplementary Fig. 8). This association was exclusive 
to East Asian individuals, in whom the 192His allele is common 
(MAF © 10%) with a substantial effect size (allelic OR = 1.79 (1.47- 
2.19)); 192His is virtually absent in individuals from other ances- 
tries (MAF =0.014%). The rs2233580 association was replicated in 
independent East Asian case-control data (n= 3,301; P=5.9 x 1077: 
Supplementary Table 9) and was distinct (r? < 0.05) from previously 
reported GWAS SNVs at the GCCI-PAX4 locus®*. PAX4 encodes a 
transcription factor involved in islet differentiation and function!” 
(Supplementary Table 10), and PAX4 variants have been implicated in 
early-onset monogenic diabetes!*, However, in East Asian cases, 192His 
was not associated with age of diabetes diagnosis (P = 0.64), indicating 
that this variant influences risk of type 2 diabetes rather than early-onset 
monogenic diabetes (Supplementary Table 9). 

To increase power to detect the association of rare variants that clus- 
ter in individual genes, we deployed gene-level variant aggregation 
tests!° across the exome sequence data (Methods and Supplementary 
Table 11). We observed no deviation from the null distribution 
of association statistics, and no single gene reached exome-wide 
significance (w= 2.5 x 10~°; Methods and Supplementary Figs 12, 13). 
When we focused on 634 genes that mapped to known GWAS regions, 
only FES exceeded a reduced significance threshold of a=7.9 x 107° 
(PsouthAsian =7.2X 10~; Prmultiethnic = 1.9 X LO?) (Methods and 
Supplementary Fig. 14). This aggregate signal was driven entirely by 
the South Asian-specific Pro536Ser variant (MAF =0.9%, OR=6.7 
(2.6-17.3), P=7.5 x 10~°), indicating that FES is likely to be the 
effector gene at the PRCI GWAS locus’. 

To increase power to detect coding variant associations (Extended 
Data Fig. 2), we contributed early T2D-GENES exome data to the 
design of an Illumina exome array’, and then collected genotypes 
from an additional 28,305 T2D cases and 51,549 controls of European 
ancestry from 13 studies (Extended Data Fig. 1, Extended Data Table 1 
and Supplementary Table 15). Of 27,904 protein-altering variants 
with MAF > 0.5% detected in exome sequence data from 4,541 
European individuals, variation at 81.6% was captured on the array 
(Supplementary Fig. 16). 

Association analysis in the combined sequence and array data 
from more than 90,000 individuals identified 18 coding variants 
(17 non-synonymous) at 13 loci that exceeded genome-wide signifi- 
cance (a=5 x 107°; Table 1 and Extended Data Figs 3, 4). All of these 
were common (MAF > 5%) and all but one mapped within established 
common-variant GWAS regions”. The exception, which we repli- 
cated in the INTERACT study”? (n =9,292; Prwreracr = 2.4 x 1074; 
Peta = 2.2 x 107!), involved a common haplotype of four strongly 
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Table 1 | Nonsynonymous coding variants achieving genome-wide significance 


Exomes (n=12,940) 


Exome-chip (n=79,854) Combined (n=92,794) 


: Eur 
Locus Gene Variant RAF range MAF Alleles OR OR OR 
P (95% Cl) P (95% Cl) P (95% Cl) 
Established common causal coding variant signals 
GCKR GCKR rs1260326 0.49-0.86 0.37 C,T 0.075 05 4.8x10-° .07 1.2x10°-9 1.07 (1.04-1.10) 
Pro446Leu (0.99-1.11) (1.04-1.11) 
PPARG PPARG rs1801282 0.86-0.99 0.14 C,G 0.0030 16 1.8x 10-7 10 4.2x10-8 1,11 (1.07-1.15) 
Prol2Ala (1.06-1.27) (1.06-1.14) 
PAM/ PAM rs35658696 0.00-0.05 0.054 GA 0.00045 36 1.7x10-7 LS 5.7x10-!0 1.17 (1.11-1.24) 
PPIP5K2 Asp563Gly (1.14-1.63) (1.08-1.23) 
PPIP5K2 _rs36046591 =0.00-0.05 0.054 GA 0.0099 34 1.0 x 10-© 17 3.3x10°8 1.19 (1.12-1.26) 
Ser1207Gly (1.12-1.61) (1.10-1.25) 
SLC30A8  SLC30A8_ rs13266634 0.58-0.91 0.33 C,T 2.9 x 10-© TD 2.7x 10°18 14 4810-3 1.14(1.11-1.17) 
Arg325Trp (1.09-1.22) (1.11-1.17) 
KCNJ11/ KCNJ11 rs9215 0.08-0.40 0.40 Coal 0.11 .07 3.4x 10° .07 1.3109 1.07 (1.05-1.10) 
ABCC8 Val337Ille (1.01-1.13) (1.04-1.11) 
rs9219 0.06-0.40 0.40 T,C 0.056 .08 5.1x 10-9 .07 9.0x10-1° 1.07 (1.05-1.10) 
Lys23Glu (1.02-1.14) (1.04-1.11) 
ABCC8 rs757110 0.06-0.40 0.40 C,A 0.20 .06 2.3x10-® .07 1.7x10-8 1.07 (1.04-1.10) 
Alal369Ser (1.00-1.12) (1.04-1.11) 
Other coding variant associations within established common variant GWAS regions 
THADA THADA rs35720761 0.85-1.00 0.10 Cai 0.0021 2 3.5x 10-8 aT 3.3x10°1° 1.12 (1.07-1.16) 
Cys1605Tyr (1.01-1.23) (1.07-1.16) 
COBLL1 COBLL1 rs7607980 =0.84-1.00 0.12 T,C 1.4 10-° 21 4.71071! 14 8310-15 1.15 (1.11-1.19) 
Asn939Asp (1.11-1.33) (1.10-1.19) 
WFS1 WFS1 rs1801212 = 0.70-1.00 0.30 A,G 0.0026 14 9.3 x 10-12 .08 9.0x10-!4 1,09 (1.06-1.12) 
Val333lle (1.06-1.23) (1.04-1.12) 
rs1801214 0.59-0.96 0.41 T,C 0.0019 .08 2.0 x 10-14 .08 1510-4 1.08 (1.05-1.11) 
Asn500Asn (1.02-1.15) (1.05-1.11) 
rs734312 0.11-0.85 0.47 A,G 0.12 05 1.3x1071° .07 6.9x10-!! 1.06 (1.04-1.09) 
Arg611His (0.99-1.11) (1.03-1.10) 
RREB1 RREB1 rs9379084 0.87-0.98 0.11 G,A 2.2x 10-5 9 1.1 x 10-5 al? 4.0x10-? 1.13 (1.09-1.18) 
Asp1171Asn (1.09-1.30) (1.06-1.17) 
PAX4 PAX4. 182233580 0.00-0.10 0.00 T,C 9.3x 10-9 79 NA NA 9.3x10-9 1.79 (1.47-2.19) 
Arg192His (1.47-2.19) 
GPSM1* GPSM1* — rs60980157 0.26 0.26 C,T NA NA 1.7x 10-9 09 1.7x10°9 1.09 (1.06-1.12) 
Ser391Leu (1.06-1.12) 
CILP2 TM6SF2 —-rs58542926 =0.03-0.10 0.082 TC 0.00015 .22 1.9x 10-7 13 3.2x10°1° 1.14 (1.10-1.19) 
Glul67Lys (1.10-1.36) (1.08-1.18) 
Coding variant associations outside established common variant GWAS regions 
MTMR3/b = =MTMR3  rs41278853 =0.92-1.00 0.083 A,G 9.2x10-° .26 3.2x10-° a2 5.6x10°9 1.14 (1.09-1.19) 
ASCC2 Asn960Ser (1.12-1.42) (1.07-1.17) 
ASCC2 rs11549795 = 0.92-1.00 0.083 Car 0.00040 23. 2.0 x 10-° LT 1010-7 1.13 (1.08-1.18) 
Val123lle (1.10-1.38) (1.06-1.16) 
rs28265 0.92-1.00 0.083 ©C,G 0.00050 21 1.9x10-> 1 1.1x10-7 1.12 (1.08-1.17) 
Asp407His (1.08-1.36) (1.06-1.16) 
rs36571 0.92-1.00 0.083 G,A 0.0023 .23 2.0 x 10-° Td 3.0x10°7 1.12 (1.08-1.17) 
Pro423Ser (1.08-1.40) (1.06-1.16) 
These loci were identified through single-variant analyses of exome sequence data from 6,504 cases and 6,436 controls and exome arrays from 28,305 cases and 51,549 controls. RAF, risk allele 
frequency; Eur MAF, minor allele frequency in Europeans; OR, odds ratio; Cl, confidence interval; n, total number of individuals analysed. Genome-wide significance defined as P<5 x 1078. 
“GPSM1 variant failed quality control in exome sequence: association P values derive only from exome-array analysis. The synonymous variant Thr515Thr (rs55834942) in HNFIA also reached 


genome-wide significance (P= 1.0 x 10-8) in the combined analysis. Three variants at ASCC2 did not achieve genome-wide significance themselves, but are included because they fall into a region 
with an other significant association signal. Alleles are aligned to the forward strand of NCBI Build 37 and represented as risk and other alleles. 


correlated coding variants in MTMR3 and ASCC2 (Table 1). Of these 
variants, MTMR3 Asn960Ser (MAF = 8.3%) had the strongest residual 
association signal on conditional analysis, implicating MTMR3, which 
encodes a phosphatidylinositol phosphatase”®, as the probable effector 
transcript at this locus (Extended Data Table 3, Extended Data Figs 3, 4, 
Supplementary Table 10 and Supplementary Fig. 17). 

The remaining coding variant signals provided an opportunity 
to highlight causal alleles and effector transcripts for known GWAS 
signals. For five loci (SLC30A8, GCKR, PPARG, KCNJ11-ABCC8, and 
PAM), the coding variants identified had previously been nominated 
as causal for their respective GWAS signals’. For the other seven 
loci, GWAS meta-analyses had previously highlighted a lead variant 


in non-coding sequence**®. We (re)evaluated these relationships 
with conditional and credible set analyses and found that, at most, the 
evidence supported a direct causal role for the coding variants con- 
cerned (Extended Data Table 3, Extended Data Figs 3, 4, Supplementary 
Table 10 and Supplementary Fig. 17). 

For example, at the CILP2 locus’, previous GWAS had identified the 
non-coding variant rs10401969 as the lead SNV. However, direct geno- 
typing of TM6SF2 Glu167Lys on the exome array revealed complete 
linkage disequilibrium with rs10401969, and reciprocal signal extinc- 
tion in conditional analyses (Extended Data Table 3 and Extended 
Data Figs 3, 4). In previous GWAS, the association at Glul67Lys 
had been obscured by incomplete genotyping and poor imputation 
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Figure 2 | Association between T2D and variants in genes for Mendelian 
forms of diabetes. a, P values (plotted as —log)9 P) of aggregate 
association for variants from 6,504 T2D cases and 6,436 controls in three 
sets of Mendelian diabetes genes, for five variant ‘masks’ (see Methods). 
Dotted line, P=0.05. b, Estimated T2D odds ratio (OR) for carriers of 


(Supplementary Table 18). The TM6SF2 Lys167 allele has been shown 
to underlie predisposition to hepatic steatosis”!, and was associated 
with fasting hyperinsulinaemia (P= 1.0 x 10~*) in 30,824 non-diabetic 
controls from the present study. This combination of genetic and func- 
tional data, consistent with known mechanistic links between insulin 
resistance, T2D, and fatty liver disease’, implicates TM6SF2 Glul67Lys 
as the most likely T2D risk variant at this locus. 

By contrast, the association at RREB1 Asp1171Asn represented a 
novel signal, conditionally independent of the adjacent common-variant 
GWAS signal. This association, together with that involving a second 
associated coding variant, Ser1554Tyr, which has a marked associa- 
tion with fasting glucose levels (P=2.7 x 10~° in 38,338 non-diabetic 
subjects from the present study; Supplementary Table 19), establishes 
RREBI (ref. 23) as the probable effector gene at the SSR1 locus. 

Given the concentration of coding-variant associations within estab- 
lished GWAS loci, we sought to nominate additional single-variant 
signals in 634 genes that mapped to established T2D GWAS regions 
using a Bonferroni-corrected a= 1.6 x 107° (Methods, Supplementary 
Fig. 14 and Supplementary Table 20). At HNF4A, we confirmed a T2D 
association at Thr139Ile (European MAF range 0.7-3.8%, OR= 1.15 
(1.08-1.22), P=2.9 x 10~°)! that was distinct from both the common 
non-coding lead GWAS SNV”?° and multiple rare HNF4A variants 
implicated in monogenic diabetes”*. Additional coding variant associ- 
ations in TSPAN8 and THADA highlighted these two genes as probable 
effector transcripts in their respective GWAS regions (Supplementary 
Tables 10, 21). 


Rare alleles in Mendelian genes 

We extended gene-based tests for rare-variant associations to gene-sets 
implicated in monogenic or syndromic diabetes or in altered glucose 
metabolism. Across 81 genes harbouring rare alleles that are causal 
for monogenic or syndromic diabetes or related glycaemic traits 
(“Monogenic all’; Supplementary Table 22), the only variant or gene 
association achieving genome-wide significance involved the previously 
mentioned PAX4 Arg192His. However, across the entire gene-set, we 
observed a weak aggregate association with T2D risk (P = 0.023; Fig. 2a). 
The association was considerably stronger in two subsets of genes 
that have been more directly implicated in monogenic and syndromic 
diabetes: a manually curated set of 28 genes for which diabetes was the 
primary phenotype (‘Monogenic primary’) and a partially overlapping 
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variants in each gene-set and mask. c, Estimated ORs (bars, left axis) and 
P values (dots, right axis) for carriers of variants in the PTV + NSgrice mask 
for each gene. Red, OR > 1; blue, OR < 1; dotted line, P= 0.05. Error bars 
represent s.e. 


set of 13 genes reported in OMIM (http://www.omim.org) as causal for 
maturity onset diabetes of the young (MODY) or neonatal diabetes 
(‘Monogenic OMIM’; Supplementary Table 22). 

The ‘Monogenic OMIM’ gene-set had a statistically robust signal of 
association (P=2.8 x 10-°, OR=1.51 (1.25-1.83)) driven by the allelic 
burden of MAF < 1% alleles. Effect size estimates tracked with increasing 
stringency of variant annotation and gene-set definition, consistent 
with progressive enrichment for functional over neutral alleles (Fig. 2b). 
This signal does not reflect inclusion among T2D cases of individuals 
who, in reality, had monogenic diabetes: the association was not 
concentrated among genes most frequently responsible for monogenic 
diabetes” (Fig. 2c), and age of diabetes diagnosis was no lower in variant 
carriers than non-carriers (Supplementary Fig. 23). The association 
signal remained after all alleles listed as ‘disease-causing’ within the 
Human Genetic Mutation Database were excluded (P=2.9 x 1074, 
OR= 1.50 (1.21-1.86)). 

These analyses point to widespread enrichment for T2D association 
among rare coding alleles in genes that are causal for monogenic diabetes. 
In these genes, alleles of penetrance sufficient to drive familial segregation 
of early-onset diabetes coexist alongside those with more modest effects 
predisposing carriers to later-onset T2D. No other compelling signals 
of rare-variant enrichment were detected using gene-set enrichment 
or protein-protein interaction analysis in other pre-defined gene-sets 
(Supplementary Figs 24, 26 and Supplementary Table 25). 


No evidence for synthetic association 
In 2010, Goldstein and colleagues proposed that common-variant GWAS 
signals may be the consequence of low-frequency and rare variants 
that by chance cluster on common haplotypes”. Although this hypothesis 
has been debated”©”’ and assessed indirectly*8, we used the near-complete 
ascertainment of genetic variation in 2,657 genome-sequenced indi- 
viduals to directly test the importance of ‘synthetic’ associations’’. We 
focused on the ten T2D GWAS loci at which our sample provided the 
strongest statistical evidence for association (P< 0.001), implementing 
a conditional analysis procedure to assess whether combinations of 
SNVs within a 5-Mb window could explain the common-variant signal 
(Extended Data Table 4 and Methods). 

We first focused on missense variants, finding that none of the ten 
signals could be explained by low-frequency and rare variants within 
2.5 Mb of the common index SNV (Extended Data Fig. 5). For example, 
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Figure 3 | Empirical T2D association results compared to results under 
different simulated disease models. Observed number of rare and 
low-frequency (MAF <5%) genetic association signals for T2D detected 
genome-wide after imputation compared to the numbers seen under 
three simulated disease models for T2D which were plausible given results 
(T2D recurrence risks, GWAS, linkage) before large-scale sequencing. 


at the IRS1 locus, including the five observed missense IRS] alleles in 
the model did not meaningfully diminish the index SNV association 
(Panconditional = 2-8 X 1076; Peonditional =4-3 X 10-9). With 99.7% ascer- 
tainment of low-frequency coding variants (Methods), these results 
rule out synthetic associations produced by missense variants at these 
ten loci. 

We expanded the search to include all low-frequency and rare 
variants, non-coding and coding, within 2.5 Mb of index SNVs. At no 
locus was a single low-frequency or rare variant sufficient to explain 
the GWAS signal (Extended Data Fig. 5). At eight of the ten loci, ten or 
more low-frequency and rare variants were needed to reverse the direc- 
tion of effect at the common index SNV; at TCF7L2, even 50 were insuf- 
ficient (Extended Data Fig. 5). We note that the statistical procedure we 
developed and deployed is biased in favour of the synthetic association 
hypothesis, since it is highly prone to over-fitting. Nonetheless, at 8 of 
the 10 loci the data were indistinguishable from a null model of no syn- 
thetic association (Extended Data Table 4 and Supplementary Fig. 27). 


Nominating candidate functional alleles 

Using the GoT2D whole-genome sequence data, we constructed 
99% ‘credible sets’ for each T2D GWAS locus on the assumption that 
there is one causal variant per locus*® (Methods). Across 78 published 
autosomal loci at which the reported index SNV had MAF > 1%, 
99% credible set sizes ranged from 2 (CDKN2A/B) to around 1,000 
(POUS5F1) variants; at 71 loci, the credible set contained more than 
10 variants (Extended Data Fig. 5 and Supplementary Table 28). The 
GoT2D data set provides near-complete ascertainment of common and 
low-frequency variants, which support more comprehensive credible 
set analysis than studies based on genotyping or imputation alone*”’; of 
the credible set variants identified from whole-genome sequence data, 
about 60% are absent from HapMap and ~5% from 1000G Phase 1 
(Extended Data Fig. 5). 

Genomic maps of chromatin state or transcription factor binding 
have been used to prioritize causal variants within credible sets***’. We 
jointly modelled genetic association and genomic annotation data at 
T2D GWAS loci using fgwas**. Consistent with previous reports***>, 
associated variants were enriched in coding exons, transcription factor 
binding sites, and enhancers active in pancreatic islets and adipose 
tissue (Extended Data Fig. 6). Overall, including the functional anno- 
tation data reduced the credible set size by 35%. At several loci, access 
to complete sequence data prioritized variants that overlapped with 
relevant regulatory annotations and had been previously overlooked. 
For example, at the CCND2 locus, three variants not present in HapMap 
Phase 2 have a combined probability of 90.0% of explaining the 
common-variant signal? (Extended Data Fig. 6); one of these (rs3217801) 
is a 2-base pair (bp) indel overlapping an islet enhancer element. 


32-35 


Intermediate model 
Low-frequency and 
rare variants explain 
~50% of heritability 


Common polygenic model 
Low-frequency and 
rare variants explain 
~25% of heritability 


n=143 


n=74 
n=38 
on ra 
— 


T=2.0 Mb,r=0.3 T = 3.75 Mb, t = 0.1 


i Number of low-frequency and rare variants, P < 1 x10-§ 
i Number of low-frequency and rare variants, P < 5 x10°8 


Simulated models were defined by two parameters: disease target size T 
and degree of coupling 7 between the causal effects of variants and the 
selective pressure against them*°. Simulated data were generated to 
match GoT2D imputation quality as a function of MAF (see Methods). 
Error bars represent s.e. observed across simulation replicates (bar value 
shows the mean). 


Modelling disease architecture 

To evaluate the overall contribution of low-frequency coding variation 
to T2D risk, we estimated the proportion of variance in T2D liabil- 
ity attributable to each such variant*® (Methods and Extended Data 
Fig. 7). We focused on exome array data to maximize sample size, and 
on variants with MAF > 0.1%; the sensitivity of variant ascertainment 
and accuracy of OR estimation decline below this threshold. Among the 
31,701 variants on the exome array with 0.1% < MAF < 5%, there was 
a progressive increase in the maximum OR estimates with decreasing 
frequency. However, the liability variance explained for these variants 
rarely exceeded 0.05%, limiting the power to detect association in the 
sample size available (Extended Data Fig. 7). We estimated (Methods) 
that the liability variance that was collectively attributable to coding 
variants in the 0.1% < MAF <5% range was 2.9%, compared to 6.3% 
for common variants. 

Finally, we compared our whole-genome T2D association results 
with predictions from population genetic simulations*? under twelve 
models that varied widely with respect to the proportion of heritability 
explained by common, low-frequency, and rare variants. We mirrored 
the GoT2D study design (with imputation) and performed in parallel 
the same association analysis on empirical and simulated data, focusing 
on variants with MAF > 0.1% and allowing for power loss due to imperfect 
imputation (Methods). 

Figure 3 displays results for three representative models: a ‘purifying 
selection’ model in which low-frequency and rare variants explain 
approximately 75% of T2D heritability; an intermediate model 
in which both common and lower-frequency variants contribute 
substantially; and a ‘neutral’ model in which common variants 
explain about 75% of T2D heritability. The predictions of the first 
two models differ markedly from the empirical data with respect 
to the numbers of low-frequency and rare risk variants that are 
associated with T2D. Specifically, these two models predict a larger 
number and greater effect size of low-frequency variants should 
be found in our whole-genome sequencing study as compared to 
those observed in the empirical data. By contrast, the empirical 
data are consistent with predictions under the ‘neutral’ common- 
variant model. 

The century-old Mendelian-biometrician debate pitted those who 
attributed trait variation to rare variants of large effect against those 
who argued that trait variation was largely due to many common 
variants of small effect. The debate today is about whether the ‘missing 
heritability’ after GWAS is due largely to individually rare, highly 
penetrant variants”! or to a large universe of common alleles with modest 
effects*. The results are of more than academic interest, as genetic 
architecture plays out powerfully in relation to the power of genetic 
diagnosis and the application of precision medicine. 
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Our data and analysis indicate that, for T2D, nearly all common- 
variant associations detectable by whole genome sequencing were 
previously found by GWAS based on genotyping arrays and imputa- 
tion; concerns about incomplete coverage due to ‘holes’ in HapMap"! 
coverage were, we show, unfounded. Of more lasting interest, the 
combination of genome and exome sequencing in large samples pro- 
vides limited evidence of a role for lower-frequency variants—coding 
or genome-wide—in T2D predisposition. Of course, rare risk alleles 
have long been known to contribute in families with early-onset forms 
of diabetes, and sequencing of Mendelian and GWAS genes has iden- 
tified rare variants that influence disease risk****, Sequencing of T2D 
cases in much larger samples will undoubtedly uncover additional 
low-frequency and rare variants that provide biological and poten- 
tially clinical value. Nonetheless, our empirical and simulated data 
suggest that these lower-frequency variants contribute much less to 
T2D heritability than do common variants. Moreover, the frequency 
spectrum of variant association signals is consistent with a model in 
which limited selective pressure distributes most of the genetic variance 
influencing T2D risk among common alleles*, in line with the 
frequency distribution of inter-individual sequence variation. Similar 
large-scale sequencing-based exploration of other complex traits will 
be required to determine the extent to which the genetic architecture 
of T2D is representative of other late-onset diseases. 

Our results further strengthen the case for sequencing of diverse 
samples; the population-enriched T2D risk variant in PAX4 dovetails 
with similar findings involving SLC16A11 (ref. 45) in East Asian and 
Native American populations and TBC1D4 (ref. 46) in Greenland 
Inuits. Studies involving populations that have been subject to 
bottlenecks or extreme selective pressures****” may be particularly 
fruitful. 

Understanding the inherited basis of T2D will require much further 
progress in identifying the mechanisms whereby common, mostly 
non-coding, variants influence disease risk. The combination of global 
epigenetic measurements, genome editing”, and high-throughput 
functional assays*? make it increasingly practical to characterize large 
numbers of non-coding variants and the processes they affect. Genome 
sequencing in much larger numbers of individuals than included in 
the current study are needed and will no doubt provide foundational 
information to guide such experimentation and connect the results 
to human population variation, physiology, and disease. Integration 
of biological insights gleaned from common and rare variant associa- 
tions with T2D into a unified picture of disease pathophysiology will be 
required to fully understand the basis of this common but challenging 
disease. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Ethics statement. All human research was approved by the relevant institutional 
review boards and conducted according to the Declaration of Helsinki. All partici- 
pants provided written informed consent. 

Data generation 

GoT2D integrated panel generation 

GoT2D sequenced samples. Here we describe how we generated, processed, and 
carried out quality control (QC) on sequence and genotype data for the 2,891 indi- 
viduals initially chosen for GoT2D from four studies, and how this resulted in 2,657 
individuals (1,326 T2D cases and 1,331 non-diabetic controls) for analysis (Extended 
Data Fig. 1). We preferentially sampled early-onset, lean, and/or familial T2D cases 
and overweight controls with low fasting glucose levels”. Specific details of selected 
samples are provided in Extended Data Table 1 and Supplementary Table 1. 

DNA sample preparation. De-identified DNA samples were sent to the Broad 
Institute (DGI, FUSION), Wellcome Trust Centre for Human Genetics in Oxford 
(UKT2D), and Helmholtz Zentrum Miinchen (KORA) and prepared for genetic 
analysis. DNA quantity was measured by Picogreen (all), and samples with suffi- 
cient total DNA and minimum concentrations for downstream experiments were 
genotyped for a set of 24 SNVs using the Sequenom iPLEX assay (DGI, FUSION, 
UKT2D): one gender assay and 23 SNVs located across the autosomes. The geno- 
types for these SNVs were used as a quality filter to advance samples and a technical 
fingerprint for subsequent sequencing and genome-wide array genotypes. 
Exome sequencing. Genomic DNA was sheared, end repaired, ligated with barcoded 
Illumina sequencing adapters, amplified, size selected, and subjected to in-solution 
hybrid capture using the Agilent SureSelect Human All Exon 44Mb v2.0 (DGI, 
FUSION, UK2T2D) and v3.0 (KORA) bait set (Agilent Technologies, USA). 
Resulting Illumina exome sequencing libraries were qPCR quantified, pooled, 
and sequenced with 76-bp paired-end reads using Illumina GAII or HiSeq 2000 
sequencers to ~82-fold mean coverage. 

Genome sequencing. Whole-genome Illumina sequencing library construction was 
performed as described for exome capture above, except that genomic DNA was 
sheared to a larger target size and hybrid capture was not performed. The resulting 
libraries were size selected to contain fragment insert size of 380 bp + 20% (DGI, 
FUSION, KORA) and 420 bp + 25% (UKT2D) using gel electrophoresis or the 
SAGE Pippin Prep (Sage Science, USA). Libraries were qPCR quantified, pooled, 
and sequenced with 101-bp paired-end reads using Illumina GAII or HiSeq 2000 
sequencers to ~5-fold mean coverage. 

HumanOmni2.5 array genotyping. Genotyping was performed by the Broad 
Genetic Analysis Platform. DNA samples were placed on 96-well plates and gen- 
otyped using the Illumina HumanOmni2.5-4v1_B SNV array. 

Alignment and processing of exome and genome sequence data 

Alignment of sequence reads to reference genome. Sequence data were processed 
and aligned to hg19 using the Picard (http://broadinstitute.github.io/picard/), 
BWA*!, and GATK*™*? pipelines. The resulting BAM and VCF files were sub- 
mitted to NCBI and are available in dbGaP (accession number phs000840.v1.p1, 
study name NIDDK_GoT2D). 

Coverage and QC of aligned sequence reads. We excluded 151 exome samples with 
average coverage <20x in >20% of the target bases and 68 genome samples with 
average coverage <5x. After sequence alignment and post-processing, aligned 
sequence reads were screened based on multiple QC criteria, including number of 
mapped reads, number of mapped bases with <1% estimated base call error rate 
(>Q20), fraction of duplicate reads, fraction of properly paired reads, distribution 
of insert sizes, distribution of mean base quality with respect to sequencing cycles, 
and GC bias (Extended Data Fig. 1). 

Detecting and handling contamination of sequence reads. We assessed possible DNA 
contamination in the genome and exome sequence data using verifyBamID™ using 
two methods. First, we estimated the contamination level of sequenced samples 
using allele frequencies estimated from the HumanOmni2.5 array on a thinned set 
of 100,000 markers with minor allele frequency (MAF) > 5%. Second, for samples 
with HumanOmni2.5 genotypes, we used these genotypes together with sequence 
data to estimate contamination and identify possible sample swaps. We excluded 
exome sequence data for 7 individuals and genome sequence data for 59 individ- 
uals with estimated contamination >2% using either method. Prior to variant 
calling, uncontaminated sample swaps were assigned to the correct sample label 
after searching for the matching pairs using the same method. 

GoT2D integrated panel genotype calling 

SNV identification. We processed whole-genome sequence reads across the remaining 
2,764 QC-passed individuals by two SNV calling pipelines: GotCloud (www.gotcloud. 
org) and GATK UnifiedGenotyper®?. We merged unfiltered SNV calls across the two 
call sets and then processed the merged site list through the SVM and VQSR filtering 
algorithms implemented by those pipelines. SNVs that failed both filtering algorithms 
were removed before genotyping and haplotype integration. For the 2,733 QC-passed 
exome-sequenced individuals, we used GATK UnifiedGenotyper to call SNVs. 


Illumina HumanOmni2.5 array genotyping. We used Illumina GenomeStudio 
v2010.3 with default clusters to call HumanOmni2.5 genotypes after comparing 
different clustering algorithms and observing that the default cluster resulted in 
highest concordance with sequence-based genotypes. Called genotypes were run 
through a standard QC pipeline; samples passing a call rate threshold of 95%, and 
genetic fingerprint (24-marker panel) and gender concordance were passed on 
to downstream GWAS QC. SNVs with GenTrain score <0.6, cluster separation 
score <0.4, or call rate <97% were considered technical failures at the genotyping 
laboratory and deleted before data release. We removed samples with call rate 
<98%, and SNVs monomorphic across all samples, failed by 1000G Omni 2.5 QC 
filter, or with Hardy-Weinberg equilibrium P < 10° (Extended Data Fig. 1). Eight-five 
samples were removed in this process. 

Short insertion and deletion (indel) identification. For the whole-genome sequence 
data, we used the GATK UnifiedGenotyper to call short indels (<50 bp). Because 
short indels are known to have high false positive rates due to systematic sequenc- 
ing and alignment errors”, we used stringent filtering criteria in SVM and VQSR 
and excluded indels that failed either algorithm. For exome sequencing, we used 
GATK UnifiedGenotyper to call short indels, following best practices described 
elsewhere. 

Large deletion identification. We used GenomeSTRiP™ to call large (>100-bp) dele- 
tions in the whole-genome sequence data. After initial discovery of large deletions 
in 2,764 QC-passed individuals, we merged the discovered sites with deletions 
identified in 1,092 sequenced individuals from the 1000G Project to increase sen- 
sitivity and then genotyped the merged site lists across the 2,764 individuals. After 
applying the default filtering implemented in GenomeSTRIipP, pass-filtered sites 
variable in any of the samples were identified as candidate variant sites. Among 
these candidate sites, we excluded variants in known immunoglobin loci to reduce 
the impact of possible cell-line artefacts. We then excluded 136 more individuals 
owing to an unusually large number of variants per sample (> median + (3 x mean 
absolute deviation)). Variants present only in these excluded individuals were 
removed from further analysis. 

GoT2D integrated panel haplotype integration 

Genotype likelihood calculation. We merged SNVs discovered from the three exper- 
imental platforms into one site list and calculated genotype likelihoods across all 
sites separately by platform. Because exome sequence data have substantial off- 
target coverage, we calculated likelihoods across the genome combining data from 
the genome and exome sequence experiments. For genome sequence, we calculated 
likelihoods using GotCloud; for exomes, we used GATK UnifiedGenotyper; for 
HumanOmnni2.5 genotypes, we converted hard genotype calls into genotype likeli- 
hoods assuming a genotype error rate of 10°. For indels, we calculated likelihoods 
in a similar way except that the HumanOmni2.5 data could not be used. For struc- 
tural variants (SVs), genotype likelihoods were calculated from GenomeSTRiP 
using the whole-genome sequence data. 

Integration of genotype and sequence data. We calculated combined genotype 
likelihoods across each of the 2,874 individuals as the product of the corre- 
sponding genome, exome, and HumanOmini2.5 likelihoods assuming independ- 
ent data across platforms (Extended Data Fig. 1). We then phased the genotype 
data using the strategy developed for 1000G Phase 1 (ref. 55). Specifically, we 
phased the integrated likelihoods using Beagle*” with 10,000 SNVs per chunk 
and 1,000 overlapping SNVs between consecutive chunks. We refined phased 
sequences using Thunder*® as implemented in GotCloud (http://genome.sph. 
umich.edu/wiki/GotCloud) with 400 states to improve genotype and haplotype 
quality. 

GoT2D integrated panel QC. 2,874 individuals were available in the integrated 
haplotype panel. To identify population outliers, we carried out principal compo- 
nents analysis (PCA). We computed PCs for each of the three variant types (SNVs, 
short indels, large deletions) using EPACTS on an LD-pruned (7? < 0.20) set of 
autosomal variants obtained by removing large high-LD regions, variants with 
MAF < 0.01, and variants with Hardy-Weinberg equilibrium P< 10~°. Inspecting 
the first ten PCs for each variant type, we identified 43 population outliers and 
136 additional outliers for large deletions only; we excluded these 179 individuals. 
We excluded an additional 38 individuals based on close relationships (estimated 
genome-wide identity-by-descent proportion of alleles shared >0.20) with other 
study members. 2,657 individuals remained available for downstream analyses 
(Extended Data Fig. 1). 

GoT2D integrated panel evaluation of variant detection sensitivity. As we had no 
external data to evaluate SNV and indel variant detection sensitivity and genotype 
accuracy for our integrated haplotype panel, we evaluated accuracy for the low- 
pass whole-genome sequence data using the exome sequence data as a gold 
standard for variants at which exome sequence depth was >10. We consider 
the resulting sensitivity and accuracy estimates as lower bounds for the inte- 
grated panel, which combined information from the genome, exome, and 
HumanOmni2.5 data. 
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We estimated the sensitivity of low-pass genome sequence data to detect true 
SNVs by calculating the proportion of exome-sequencing-detected SNVs detected 
by low-pass genome sequencing in the 2,538 individuals with data for all three 
experimental platforms. For exome sequence allele counts <1,000, we merged 
adjacent allele count bins until the number of alleles was > 1,000. We estimated 
the sensitivity of low-pass genome sequencing to detect common, low-frequency, 
and rare SNVs as 99.8%, 99.0%, and 48.2%, respectively. Similarly, we estimated the 
sensitivity of low-pass genome sequence to detect true short indels by calculating 
the proportion of exome sequencing-detected short indels detected by low-pass 
genome sequencing. Sensitivity estimates were >99.9%, 93.8%, and 17.9% for 
common, low-frequency, and rare short indels, respectively. 

To estimate the sensitivity of the combined low-pass genome and exome 

sequence data, we focused on coding SNVs and calculated the proportion 
of HumanOmni2.5 SNVs detected by either sequencing platform. Because 
HumanOmni2.5 SNVs are enriched for common variants, we calculated a weighted 
averaged sensitivity at each allele count, weighted by the number of exome-detected 
variants given the allele count. Sensitivity estimates were 99.9%, 99.7%, and 83.9% 
for common, low-frequency, and rare variants. 
GoT2D integrated panel evaluation of genotype accuracy. To evaluate genotype accu- 
racy for SNVs, we focused on chromosome 20, and compared the concordance of 
low-pass whole-genome-sequence-based genotypes with those based on exome 
sequence. Overall genotype concordance was 99.86%. Homozygous reference, 
heterozygous, and homozygous non-reference concordances were 99.97%, 98.34%, 
and 99.72%, respectively. We also compared genotype concordance between exome 
sequence and HumanOmini2.5 genotypes. Overall concordance was 99.4%. When 
the HumanOmni2.5 genotypes were homozygous reference, heterozygous, and 
homozygous non-reference, concordances were 99.97%, 99.69%, and 99.88%, 
respectively. 

We evaluated the genotype accuracy of indels for the 210 chromosome-20 indels 
that overlapped between those discovered by exome and genome sequencing. 
Overall genotype concordance was 99.4%. When the exome genotypes were 
homozygous reference, heterozygous, and homozygous non-reference, concord- 
ances were 99.8%, 95.8%, and 98.6%, respectively. 

To evaluate the genotype accuracy of our low-pass genome sequence data to 
detect true structural variants, we took advantage of the 181 individuals in our 
study who were previously included in the WTCCC array-CGH based structural 
variant detection experiment’. Taking the WTCCC data as a gold standard, we 
estimated genotype accuracy across 1,047 overlapping structural variants (with 
reciprocal overlap >0.8) genome-wide. The overall genotype concordance was 
99.8%. When the WTCCC genotypes were homozygous reference, heterozygous, 
and homozygous non-reference, concordances were 99.9%, 99.6%, and 99.7%, 
respectively. 

GoT2D+T2D-GENES multiethnic exome panel generation and QC 

Samples. We considered 6,504 T2D cases and 6,436 controls from 14 studies of 
African American, East Asian, South Asian, Hispanic, and European ancestry. In 
contrast to the GoT2D whole-genome integrated panel, this data set also includes 
GoT2D individuals for whom whole-genome data were not available. Sample 
characteristics are provided in Extended Data Table 1 and Supplementary Table 4. 

Sequence reads were processed and aligned to the reference genome (hg19) 
with Picard (http://broadinstitute.github.io/picard/). Polymorphic sites and gen- 
otypes were called with GATK, with filtering of sites performed using Variant 
Quality Score Recalibration (VSQR) for SNVs, and hard filters for indels. Genotype 
likelihoods were computed controlling for contamination. 

Hard calls (the GATK-called genotypes but set as missing at a genotype quality 
(GQ) <20 threshold>”) and dosages (the expected value of the genotype, defined as 
Pr(RX|data) + 2Pr(XX|data), where X is the alternative allele) were computed for 
each sample at each variant site. Hard calls were used only for quality control, while 
dosages were used in all downstream association analyses. Multi-allelic SNVs and 
indels were dichotomized by collapsing alternate alleles into one category because 
downstream association analyses required bi-allelic variants. 

Individuals were excluded from analysis if they were outliers on one of mul- 
tiple metrics: poor array genotype concordance (where available), high number 
of variant alleles or singletons, high or low allele balance (average proportion of 
non-reference alleles at heterozygous sites), or excess mean heterozygosity or ratio 
of heterozygous to homozygous genotypes. 

Within this reduced set of individuals, we then performed extended QC using 
ethnicity and T2D status to provide high-quality genotype data for downstream 
association analyses. Within each ethnicity, we excluded variants based on hard 
call rate (<90% in any cohort), deviation from Hardy-Weinberg equilibrium 
(P<10 ° in any ancestry group), or differential call rate between T2D cases and 
controls (P< 10°‘ in any ancestry group). We then considered autosomal variants 
that passed extended QC and with MAF >1% in all ancestry groups for trans- 
ethnic kinship analyses. We calculated identity-by-state (IBS) between each pair of 
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samples based on independent variants (trans-ethnic r° < 0.05) and constructed 
axes of genetic variation through PCA implemented in EIGENSTRAT™ to identify 
ethnic outliers (Supplementary Fig. 29). We also identified duplicates based on IBS, 
and excluded the sample from each pair with lowest call rate and/or mismatch with 
external information. The extended QC excluded 68 individuals, 9.9% of SNVs 
and 90.8% of indels from the clean data set. 

Association analysis 

Power calculation. We used the genetic power calculator (http://pngu.mgh. 
harvard.edu/~purcell/gpc/) to estimate power to detect T2D association 
assuming 8% prevalence. For the TZ2D-GENES+GoT2D exome sequence 
data set we assumed: (i) a fixed-effect across all five ancestry groups (12,940 
individuals); and (ii) an effect specific to one group (2,000 individuals) 
(Extended Data Fig. 2). We repeated our calculations for combined exome 
sequence and exome array data, assuming a fixed effect across all ethnicities, for 
an effective total sample size of 82,758 individuals (Extended Data Fig. 2). 

For the GoT2D integrated panel we allowed for incomplete variant detection by 
multiplying power by the estimated sensitivity to detect the variant as a function 
of MAE For imputed variants, we first multiplied the sample size by the median 
imputation quality (rsq_hat) obtained from MaCH/Thunder or minimac® for 
the corresponding MAF bin across the analysed cohorts, and then multiplied the 
estimated power by the fraction of variants that passed the imputation quality 
cutoff for that MAF bin. 

For gene-based tests in the T2D-GENES+GoT2D data, we made use of a 
Bonferroni correction for 20,000 genes, corresponding to P< 2.5 x 10-°. We used 
a simulated haplotype data set from the SKAT package (http://cran.r-project.org/ 
web/packages/SKAT/vignettes/SKAT.pdf) and estimated the power of SKAT-O 
to detect association of variants within a gene at this threshold as a function of 
the phenotypic variance (1%) in a liability scale explained by additive genetic 
effects and the percentage of variants that were causal (50% and 100%). As for 
single-variant power calculations, we considered: (i) a fixed-effect across all 
ethnicities (12,940 individuals); and (ii) an effect specific to one ancestry group 
(2,000 individuals) (Extended Data Fig. 2). 

GoT2D integrated panel association analysis 

Single-variant association analysis. We tested for T2D association in a logistic 
regression framework assuming an additive genetic model. We used the Firth 
bias-corrected likelihood ratio test®® as our primary analysis strategy; we 
repeated association analysis using the score test for inclusion in sample-size- 
weighted meta-analysis (Supplementary Table 2). Tests were adjusted for sex, 
the first two genotype-based PCs to account for population stratification, and 
an indicator function for observed temporal stratification based on sequencing 
date and centre. PCs were calculated using linkage-disequilibrium (LD) pruned 
(° <0.20) HumanOmni2.5M array variants with MAF >1% after removing large 
high-LD regions”, 

Aggregate association analysis. To test for aggregate association within coding 
regions of the genome, we used the approach described in Gene-based analysis 
below. For every gene and mask tested, P values were greater than 2.5 x 10~*. 

We also tested for aggregate association among variants in non-coding regions 
of the genome. We aggregated variants in individual pancreatic islet enhancer 
elements (see Genomic annotation below), as these elements collectively demon- 
strated strongest genome-wide enrichment of T2D association. We performed both 
the burden and SKAT tests using genotypes from the integrated panel on variants 
with MAF < 5% in each islet enhancer element. We used a Bonferroni threshold 
P<1.68 x 10-7 based on a nominal significance level of « =0.05 corrected for 
298,240 elements with at least one variant. All elements tested in this manner had 
P values greater than 2.5 x 10°°. 

GoT2D+T2D-GENES multiethnic association analysis 

Kinship analysis. Within each ancestry group, we considered autosomal var- 
iants that passed QC with MAF >1% for ethnic-specific kinship analyses. 
We calculated IBS between each pair of samples in the ancestry group based 
on independent variants (ethnic-specific r? < 0.05) and constructed a kin- 
ship matrix to account for intra-ethnic population structure and related- 
ness in downstream mixed-model (EMMAX)-based association analyses!*. 
We also used IBS to identify pairs of related individuals within each ances- 
try group (defined by pi-hat > 0.3). We then defined intra-ethnic related 
exclusion lists for downstream non-EMMAX association analyses using the 
following steps: (i) remove the control from each T2D-status discordant pair; and 
(ii) remove the sample with lowest call rate from each T2D-status concordant 
pair. We also constructed intra-ethnic axes of genetic variation through PCA 
implemented in EIGENSTRAT™. We identified axes of genetic variation in each 
ancestry group for inclusion as covariates in downstream non-EMMAX association 
analyses to account for intra-ethnic population structure that: (i) explain at least 
0.5% genotypic variation; and/or (ii) demonstrate nominal association (P < 0.05) 
with T2D in logistic regression analysis. 
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Single-variant association analysis. Within each ancestry group, we performed 
a score test of T2D association with each variant passing ethnic-specific QC in 
a linear regression framework under an additive model in EMMAX"°. We also 
performed a Wald test of T2D association with each variant passing ethnic- 
specific QC in a logistic regression framework under an additive model with 
adjustment for ethnic-specific axes of genetic variation after exclusion of related 
samples (Supplementary Table 30). Within each ancestry group, we calculated 
genomic control inflation factors (score EMMAX and Wald) based on inde- 
pendent variants used for the ethnic-specific kinship analyses and corrected 
association summary statistics (P value and s.e.) to account for residual popu- 
lation structure. 

Subsequently, we performed trans-ethnic fixed-effects meta-analysis of ancestry- 
specific association summary statistics at each variant based on: (i) sample size 
weighting of score EMMAX directed P values; and (ii) inverse-variance weighting 
of Wald beta/s.e. (to obtain unbiased estimates of allelic odds ratios and con- 
fidence intervals that cannot be constructed from EMMAX effect estimates). 
We also performed trans-ethnic meta-analysis of ancestry-specific association 
summary statistics (score EMMAX beta/s.e.) at each variant using MANTRA®, 
using pair-wise mean allele frequency differences at the subset of independent 
variants used for trans-ethnic kinship analyses as a prior for relatedness between 
ancestry groups. 

Validation of PAX4 association signal in additional East Asian studies. We validated 
the PAX4 Arg192His (rs2233580) association signal in an additional 1,789 T2D 
cases and 1,509 controls of East Asian ancestry from Hong Kong, Korea, and 
Singapore (Supplementary Table 9). Within each study, we tested for association 
with T2D in a logistic regression model, and combined association summary 
statistics across studies through fixed-effects meta-analysis (Supplementary Table 9). 
Among T2D cases, we also tested for association with age of diagnosis in a linear 
regression model, and combined association summary statistics across studies 
through fixed-effects meta-analysis (Supplementary Table 9). 

Admixture analysis. Admixed populations can offer greater statistical power to 
detect association because diverse ancestry increases genetic variation. However, 
admixture can also introduce false-positive signals due to population stratifica- 
tion and heterogeneity of effects because of differential LD°’. To assess the con- 
tribution of ancestral background in the two admixed groups (African American 
and Hispanic), we inferred local ancestry based on SNVs in available GWAS data 
using two approaches. For African Americans, we ran HAPMIX® using CEU 
and YRI haplotypes from HapMap as reference, and estimated the proportion of 
European ancestry at each genomic position. For Hispanics, we ran Multimix® 
using European, West African, and Native American haplotypes from HapMap 
as reference, and estimated the proportion of European ancestry at each genomic 
position, since we observe only a very low West African contribution (1.1-3.2%, 
Supplementary Fig. 31). We then repeated our intra-ethnic EMMAX-based analyses 
within African American and Hispanic ancestry groups, this time adjusting for 
local ancestry by including the estimated proportion of European ancestry at each 
variant as a covariate. Adjustment for local ancestry resulted in numerically similar 
association statistics as those from unadjusted analyses in the African American 
and Hispanic samples. 

Gene-based analysis. We generated four variant lists (‘masks’) based on MAF 
and functional annotation. We mapped variants to transcripts in Ensembl 66 
(GRCh37.66). Using annotations from CHAS v0.6.3, SnpEFF v3.1, and VEP 
v2.7, we identified variants predicted to be protein-truncating (for example, 
nonsense, frameshift, essential splice site) denoted PT V-only or ‘Mask 1’; or 
protein-altering (for example, missense, in-frame indel, non-essential splice site) in 
at least one mapped transcript (by at least one of the three algorithms) with MAF 
<1%, denoted PTV-+missense or “Mask 2. We additionally used the procedure 
described by Purcell et al.”° to identify subsets of missense variants with MAF 
<1% meeting ‘strict’ or ‘broad’ criteria for being deleterious, using annotation 
predictions from Polyphen2-HumDiv, PolyPhen2-Hum Var, LRT, Mutation Taster, 
and SIFT; variants predicted deleterious by all five algorithms or by at least one 
algorithm were denoted PTV+NSegtrict or ‘Mask 3’ and PTV+NSproaa or ‘Mask 4, 
respectively. Indels predicted by CHAOS, SnpEFE or VEP to introduce frameshifts 
were included in the ‘strict’ category. We calculated MAFs for each ancestry using 
high-quality genotype calls (GQ > 20) for all samples passing extended QC. We 
considered a variant to have MAF < 1% if MAF estimates for every ancestry group 
were <1%. 

We used the MetaSKAT R package (v0.32)!° with the SKAT v0.93 library to per- 
form SKAT-O”! analysis within each ancestry, and in meta-analysis. Within each 
ancestry group, we analysed genotype dosages with adjustment for ethnic-specific 
axes of genetic variation after exclusion of 96 related individuals. We assumed 
homogenous allele frequencies and genetic affects for all studies within an ances- 
try group. We performed meta-analysis using genotype-level data, allowing for 
heterogeneity of allele frequencies and genetic effects between (but homogeneity 


within) ancestry groups. All analyses were completed using the recommended rho 
vector for SKAT-O: (0, 0.12, 0.22, 0.32, 0.52, 0.5, 1). 

Imputed data 

Samples. We carried out genotype imputation into 44,414 individuals 
(11,645 T2D cases and 32,769 controls) from 13 studies using the GoT2D 
integrated haplotypes as reference panel. Characteristics of the imputed studies are 
provided in Extended Data Table 1 and Supplementary Table 3. 

Single-variant association meta-analysis. The one sequenced and thirteen imputed 
studies totalled 12,971 T2D cases and 34,100 controls. Each study performed its 
own sample- and variant-based QC. In each study, SNVs with minor allele count 
(MAC) > 1 passing QC were tested for T2D association assuming an additive 
genetic model adjusting for study-specific covariates. Association testing was 
performed using logistic regression Firth bias-corrected, likelihood ratio, or 
score tests as implemented in EPACTS (genome.sph.umich.edu/wiki/EPACTS) 
or SNPTEST”. To account for related samples in the Framingham Heart Study, 
generalized estimating equations (GEE) were used, as implemented in R. Residual 
population stratification for each study was accounted for using genomic 
control”’, We then carried out fixed-effects sample-size weighted meta-analysis as 
implemented in METAL”. 

Conditional analyses in established GWAS loci. We compiled a list of 143 previously 
reported genome-wide significant SNVs in 81 T2D autosomal loci (i) from Morris 
et al. and Voight et al.’; (ii) from papers they referenced; and (iii) from references 
in the NHGRI GWAS catalogue”*. We LD pruned these SNVs (r° < 0.95), yielding 
a list of 129 SNVs. We deleted the CILP2 locus (and two SNVs) from subsequent 
whole-genome analyses owing to large regions in which no variants passed QC, 
resulting in a list of 127 index SNVs at 80 autosomal loci. To identify additional 
T2D-associated variants within these 80 T2D autosomal loci in the genome-wide 
data, we repeated GWA analysis for 12 of the 13 studies (conditional analysis results 
for FHS were unavailable), conditioning on the 127 index SNVs. We performed 
fixed-effects inverse-variance meta-analysis to combine conditional analysis results 
from the studies totalling 12,298 cases and 26,440 controls. For each known locus, 
we analysed all SNVs within 500 kb of the known index SNVs; if there were multi- 
ple known index SNVs, we analysed all SNVs within 500 kb of the most proximal 
and distal index SNVs. We imposed a conditional-analysis significance threshold 
of a= 1.8 x 10~6 based on a proportional number of multiple tests for ~83 Mb of 
the ~3,000-Mb genome. 

Exome array data 

Samples. We considered 28,305 T2D cases and 51,549 controls from 13 studies of 
European ancestry, genotyped with the Illumina exome array. Characteristics of 
the studies are provided in Extended Data Table 1 and Supplementary Table 15. 
Overlap of exome sequence variation with exome array. We assessed overlap of 
variants present on the exome array with those observed in our trans-ethnic 
exome-sequence data. As exome array primarily contains SNVs that are predicted 
to be protein altering, we focused on nonsense, essential splice site, and missense 
variants. Only variants passing QC in both sequence and array data were included 
in our overlap assessment. 

Data processing, QC, and kinship analysis. Within each study, exome array 
genotypes were initially called using Illumina GenCall (http://www.illumina.com/ 
Documents/products/technotes/technote_gencall_data_analysis_software.pdf) 
and Birdseed”°. Sample and variant QC was then undertaken within each study 
based on several quality control filters. Criteria for sample exclusion included 
low call rate (<99%), mean heterozygosity, high singleton counts, non-European 
ancestry, sex discrepancy, GWAS discordance (where data were available), geno- 
typing platform fingerprint discordance, and duplicate discordance. Variants were 
excluded based on call rate (<99%), deviation from Hardy-Weinberg equilib- 
rium (P< 107°), duplicate, chromosome or allele mismatch, GenTrain score <0.6, 
cluster separation score <0.4, and manual cluster checks. Missing genotypes were 
subsequently re-called using zCall, with a second round of QC to exclude poor 
quality samples (call rate <99% and mean heterozygosity) and variants (call rate 
<99%). Within each study, we considered independent autosomal variants that 
passed QC with MAF > 1% for kinship analyses, and calculated IBS between each 
pair of samples. We used these statistics to: (i) identify non-European ancestry 
samples to be excluded from all downstream analyses; (ii) construct a kinship 
matrix to account for fine-scale population structure and relatedness in down- 
stream EMMAX-based association analyses; (iii) identify related samples to be 
excluded from downstream non-EMMAX association analyses; and (iv) calculate 
axes of genetic variation for inclusion as covariates in downstream non-EMMAX 
association analyses to account for fine-scale population structure (if required). 
Single-variant association analysis. Within each study, we performed a score test 
of T2D association with each variant passing QC in a mixed-model regression 
framework under an additive model in EMMAX"*. We also performed a Wald 
test of T2D association with each variant in a logistic regression framework under 
an additive model with adjustment for axes of genetic variation after exclusion of 
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related samples. For each test, we corrected s.e. and P value for the genomic con- 
trol inflation factor (if > 1) calculated on the basis of the independent autosomal 
variants used for kinship analysis. 

Across studies, we performed fixed-effects meta-analysis of association summary 
statistics at each variant based on: (i) inverse-variance weighting of score EMMAX 
beta/s.e.; (ii) sample size weighting of score EMMAX directed P values; and (iii) 
inverse-variance weighting of Wald beta/s.e. For each of these meta-analyses, 
we applied a second round of correction of s.e. and P value by genomic control, 
again calculated based on the independent autosomal SNVs used for kinship 
analyses. 

Combined exome sequence and exome array single-variant analysis. We considered 
variants that were represented both in the exome sequence and on the exome 
chip. We began by performing fixed-effects meta-analysis of association summary 
statistics (after correction for genomic control, as described above) from the exome- 
chip meta-analysis and the European ancestry sequenced samples using: (i) inverse- 
variance weighting of score EMMAX beta/s.e.; (ii) sample size weighting of score 
EMMAX directed P values; and (iii) inverse-variance weighting of Wald beta/s.e. 
Subsequently, we performed trans-ethnic fixed-effects meta-analysis of ancestry- 
specific association summary statistics (after correction for genomic control, 
as described above) at each variant based on: (i) sample size weighting of score 
EMMAX directed P values; and (ii) inverse-variance weighting of Wald beta/s.e. 
Gene-based analyses. We made use of the four variant masks defined for exome 
sequence gene-based analyses, but with MAF calculated across all exome array 
studies. Within each study, we performed SKAT-O analyses”!, with adjustment 
for axes of genetic variation after exclusion of related samples. We combined P 
values for association across studies via meta-analysis with Stouffer’s method”’. 
Evaluating relationships between association signals for coding variants and 
previously reported lead SNVs at established GWAS loci. For coding variants that 
mapped to established T2D susceptibility loci and achieved genome-wide sig- 
nificance in combined exome sequence and/or exome array analysis, we used 
complementary approaches with a range of available genetic data resources to 
evaluate their contribution to the association signals of previously reported lead 
SNVs. If the previously reported lead SNV (or a good proxy, 17 > 0.8) was geno- 
typed on the exome array, we performed reciprocal conditional analyses with the 
available exome array data. Within each study, we repeated EMMAX analyses in 
GWAS loci, including additively coded genotypes at the previously reported? lead 
SNV or genome-wide significant coding variant as an additional covariate in the 
regression model. Across studies, we performed fixed-effects meta-analysis of asso- 
ciation summary statistics at each variant based on: (i) inverse-variance weighting 
of score EMMAX beta/s.e.; (ii) sample size weighting of score EMMAX-directed 
P values. If the previously reported lead SNV (or a good proxy) was not genotyped 
on the exome array, we performed approximate reciprocal conditional analysis, 
implemented in GCTA’’, using genome-wide meta-analysis association summary 
statistics from 12,971 T2D cases and 34,100 controls from the combined GoT2D 
integrated panel and imputed data. Patterns of LD between variants were estimated 
using a subset of the GoT2D integrated panel, restricted to 2,389 individuals with 
pairwise genetic relationship <0.025, as defined by the GCTA A statistic”. Finally, 
we interrogated 99% credible sets of variants at each GWAS locus, which together 
represent > 99% of the probability of driving each association signal. We deter- 
mined whether the coding variant at each locus was included in the credible set for 
the association signal for the previously reported lead SNV, and recorded its rank. 
Enrichment of exome association signals in GWAS. To define T2D-associated 
intervals, we first identified all SNVs associated with T2D in published genome- 
wide association studies (GWAS) by searching literature and the NHGRI GWAS 
catalogue (see also Conditional analyses in established GWAS loci above). We iden- 
tified 143 autosomal SNVs, with some associated in more than one ancestry (167 
SNV-ancestry pairs). For each SNV-ancestry pair, we identified the most distant 
pair of SNVs with 7? > 0.5 in 1000G Phase I data, using the appropriate continen- 
tal subset of 1000G samples (EUR, AMR, or ASN). We used 1000G data, rather 
than our own exome sequence data, because most reported associations for T2D 
are with common, intergenic SNVs. We then extended each region of interest by 
moving out 0.02 cM from those two SNVs (to encompass nearby recombination 
hotspots), and added an additional 300 kb upstream and downstream. We merged 
overlapping intervals, yielding 81 unique associated regions, and identified 634 
genes completely or partially included within associated regions. In single-variant 
analyses, we analysed 3,147 non-synonymous variants within these genes in the 
combined exome sequence and exome array data sets, using a Bonferroni corrected 
significance threshold of a =0.05/3,147 = 1.6 x 10~°. We considered gene-level 
association statistics from exome sequence for these 634 genes using a Bonferroni- 
corrected significance threshold of « =0.05/634=7.9 x 10°. 

We note that by reducing the stringency of the significance threshold for vari- 
ants within GWAS loci, we increase the ‘experiment-wise’ type I error rate across 
the entire exome. Assuming that 3% of 100,000 coding variants interrogated in 
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this study map to T2D GWAS loci, as defined above, we would need to change the 
threshold of significance outside of these regions to P< 2.1 x 10-* to maintain an 
‘experiment-wise’ type I error rate of 5%. 

Testing for ‘synthetic associations’ at T2D loci in GoT2D genome sequence data. 
To identify low-frequency or rare variants that could potentially define synthetic 
associations, we analysed the ten T2D loci at which a previously-reported tag SNV 
achieved P< 0.001 in our single-variant analysis of the genome sequence data set. 
We defined as candidates at each locus all low-frequency or rare variants (excluding 
singletons) within a 5-Mb window (centred on the prior GWAS signals) and tested 
for synthetic associations caused by either (i) a single low-frequency or rare variant 
or (ii) multiple low-frequency or rare variants on a common haplotype. 

To identify synthetic associations driven by a single low-frequency or rare variant 
at each of the ten loci, we performed a series of conditional analyses in which 
we tested for association between gene dosage at the previously reported GWAS 
index SNV and T2D risk via logistic regression, while including each candidate 
low-frequency or rare SNV (excluding singletons) as an additional covariate, 
one-by-one. If inclusion of the low-frequency or rare variant resulted in a condi- 
tional association P > 0.05 for the tag SNV, we considered the common-variant 
association signal a potential synthetic association. 

To identify synthetic associations based on sets of low-frequency or rare vari- 
ants, we extended this approach. We (i) defined common haplotypes segregating 
at each T2D locus; (ii) identified all low-frequency or rare (excluding singletons) 
variants occurring on T2D-associated haplotypes (haplotypes on which the T2D- 
associated GWAS index SNV minor allele is present); and (iii) asked whether 
any combination of these low-frequency or rare variants could explain the effect 
observed at the T2D GWAS index SNV. We carried out these analyses restricting 
attention to protein-coding variants within the window and then again for all 
low-frequency and rare SNVs in the 5-Mb window. 

To define common haplotypes at each locus, we used the phased whole-genome 
sequence data. We first employed the phased genotypes for common (MAF > 5%) 
variants segregating in the interval between recombination hotspots at the locus 
(to minimize the number of recombinant haplotypes identified). We next identified 
the haplotypes on which the T2D-associated (risk or protective) GWAS index 
SNV minor allele was present. We then assembled the set of low-frequency and 
rare variants from across the 5-Mb interval which occurred on the background of 
these T2D-associated common-variant haplotypes. Owing to recombination and 
imperfect phasing, low-frequency or rare (excluding singletons) variants are often 
observed on more than one haplotype background. We included all low-frequency 
or rare variants that occurred more frequently on a T2D-associated haplotype than 
on other haplotypes. 

From this pool of low-frequency and rare variants, we considered only variants 
with the same direction of effect as the common GWAS index SNV minor allele, as 
required by the synthetic association hypothesis, which posits that low-frequency 
or rare variants of larger effect than the common SNV could induce a weaker asso- 
ciation signal. We then used a greedy algorithm to select the low-frequency or rare 
variant which, when added to the index GWAS SNV’s dosage in a logistic regres- 
sion, most reduced the residual effect remaining at the index SNV, as measured by 
estimated conditional OR. We repeated this process, adding variants to the model, 
until the estimated effect at the index SNV genotype or gene dosage changed sign, 
representing no residual effect of the index SNV. At each locus, we also counted the 
number of variants required to increase the association P value at the GWAS index 
SNV beyond the nominal P= 0.05 significance threshold (Extended Data Table 4). 
Credible set analysis of GoT2D genome sequence data. At 78 of the 80 T2D 
GWAS loci (see Conditional analyses in established GWAS loci above), the pre- 
viously reported index SNV had MAF > 1% in our GoT2D genome-sequenced 
sample. At these 78 loci, we constructed credible sets of common variants that, 
with some minimum specified probability (for example, > 99%), contain the var- 
iant causal for the corresponding association signal. Our analysis assumes a sin- 
gle causal SNV per signal and that the SNV was genotyped*”*!. We constructed 
credible sets for up to two independent association signals at each locus; at five 
loci with multiple independent (r? < 0.10) GWAS index SNVs, we constructed 
two distinct credible sets. 

For each GWAS index SNV, we identified the set of common variants with 
’ > 0.10 with the index SNV within a 5-Mb window centred on the index SNV. For 
each variant in this set, we calculated the posterior probability of being causal*!. We 
first calculated an approximate Bayes’ factor (ABF) for each variant as: 

ABF= J1—r e’/? 
where r=0.04/(s.e.7 + 0.04), z= (/s.e., and Gand s.e. are the estimated effect size 
(log OR) and its standard error from logistic regression. We then calculated the 
posterior probability for each variant as ABF/T, where T is the sum of the ABF 
values over all candidate variants across the interval. This calculation assumes a 
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Gaussian prior with mean 0 and variance 0.04 for (3, the same prior employed in 
the commonly used single-variant association program SNPTEST”. 

We based the analysis on the genome-wide meta-analysis results, since most 
common variants were included in this analysis, and sample sizes were significantly 
larger than for the genome sequence data alone. 

We calculated the effective imputed sample size for each variant in the 
af 
nf is the effective sample size for imputation cohort j. To ensure approximately 


meta-analysis data as Nog = > nf , where rj is the imputation quality and 


uniform sample size across variants, we considered to be well-imputed only those 
variants with effective imputed sample size (N.) > 80% of the maximum observed 
across all variants in the window. 

Indels were not imputed or meta-analysed in this study, and <2% of common 
SNVs were not well-imputed by the above effective sample size criterion. To include 
these common variants while using the most precise estimates available, we calcu- 
lated posterior probabilities separately from each genome-wide data source. Where 
an indel from the sequence data set had a SNV proxy in high LD (1° > 0.80) in the 
meta-analysis data set, we used the proxy’s information instead. Where a common 
SNV that was poorly imputed had high-quality association data from the genome 
sequence data alone, the posterior probability from the genome sequence data set 
was used instead. In each case, the final posterior probabilities for all SNVs were 
re-scaled such that their sum across a locus equaled one. 

We used these final posterior probabilities to rank variants in decreasing order. 
To define credible sets of a specified level (for example, 99%), we included variants 
with highest final posterior probabilities until their sum reached or exceeded that 
level (Supplementary Table 28). 

Genome enrichment analyses of the GoT2D genome sequence data 

Genomic annotation. We collected genome annotation data from several sources. 
First, we obtained gene transcript information from GENCODEVv14 (ref. 80). For 
protein-coding genes, we included transcripts with a protein-coding tag that 
either were present in the conserved coding DNA sequence (CCDS) database or 
had experimentally confirmed mRNA start and end; we then included 5’ UTR, 
exon, and 3’ UTR regions from the resulting transcripts. For non-coding genes, we 
included transcripts with a ncRNA, miRNA, snoRNA, or snRNA tag. 

Second, we defined regulatory chromatin states in 12 cell types. We collected 
sequence reads generated for the following assays: H3K4mel, H3K4me3, 
H3K27ac, H3K27me3, H3K36me3, and CTCF ChIP, in nine ENCODE cell types 
(GM12878, K562, HepG2, Hsmm, HUVEC, NHEK, NHLE, hESC, HMEC)”, 
pancreatic islets*°, and hASC (adipose stromal cell) pre- and mature adipocytes*’. 
We mapped reads to hg19 using BWA®*! and used the resulting mapped reads 
for all cell types to call regulatory states using ChromHMM™, assuming ten 
states. We then assigned names to the resulting state definitions: (i) H3K4me3, 
H3K27ac (active promoter); (ii) H3K4me3, H3K27ac, H3K4mel (active enhancer 
1); (iii) H3K27ac, H3K4mel (active enhancer 2); (iv) H3K4mel (weak enhancer); 
(v) H3K27me3, H3K4me3, H3K4mel (poised promoter); (vi) H3K27me3 
(repressed); (vii) low/no signal 1; (viii) CTCF (insulator); (ix) low/no signal 2; 
and (x) H3K36me3 (transcription). 

Third, we obtained transcription factor-binding ChIP sites from three sources: 
141 proteins from ENCODE™, 5 from Pasquali et al.*°, and 1 from Mikkelsen 
et al.>?, 

From gene transcript data we defined CDS (protein coding transcript exons); 

ncRNA (non-coding RNA transcripts); and 3’ and 5’ UTR (UTR regions of coding 
transcripts). From chromatin state data for each of the 12 cell types we identified 
active enhancers (pooled active enhancer 1 and 2 elements); weak enhancers; and 
active promoters. From transcription factor binding sites we defined transcription 
factor-binding sites (TFBS) (sites pooled across all factors). This resulted in a total 
of 41 annotation categories (Extended Data Fig. 6). 
Enrichment of genome annotation. We jointly modelled variants in credible sets 
using T2D association and the functional annotation classes using the method 
described by Pickrell**. First, we tested each annotation individually and identified 
the annotation that most improved the model likelihood. We then iteratively added 
annotations in this manner until the likelihood did not increase further. Using this 
set of annotations, we tested a range of penalized likelihoods (from 0 to 1 in 0.01 
increments) using tenfold cross-validation, and identified the penalty that gave the 
best cross-validation likelihood. Using this penalty, we then iteratively dropped 
annotations to identify the model with the maximal cross-validation likelihood. 
The resulting model included coding exons, TFBS, hASC mature adipose active 
enhancers and promoters, pancreatic islet active and weak enhancers and active 
promoters, hASC pre-adipose active and weak enhancers, NHEK active enhancers, 
NHLE active enhancers, K562 weak enhancers, HMEC weak enhancers and active 
promoters, H1-hESC active promoters, ncRNA, and 5’ and 3’ UTR (Extended Data 
Fig. 6). Finally, we used this model to update posterior probabilities for each variant 
and re-calculate 99% credible sets. 


Gene enrichment analyses in the GoOT2D+T2D-GENES exome sequence data. 
We first used the SMP (statistics/matrix/permutation) gene-set enrichment pro- 
cedure implemented in the PLINK/Seq package (https://atgu.mgh.harvard.edu/ 
plinkseq/). This approach calculates enrichment statistics for large sets of genes to 
establish whether case-enrichment of rare variants is preferentially concentrated in 
a particular set of genes, controlling for any exome-wide/baseline difference in case 
and control rates. The procedure uses gene-based association statistics, and forms 
sums of these statistics over all genes in a set, the significance of which is evaluated 
by permutation. We considered the relative enrichment statistic, SSET/SEXOME, 
with significance evaluated empirically (10,000 replicates) based on the null 
distribution of this ratio. The reported effect sizes from the gene-set enrichment 
analysis are estimates of the unconditional odds ratio that do not take exome-wide 
differences in case/control rates into account”. We selected 18 ‘premium sets of 
genes (Supplementary Table 32) that reflect the current knowledge of pathways 
(N= 15) involved in T2D and the three sets of genes involved in monogenic form 
of diabetes defined above: ‘Monogenic all’ (N= 81); ‘Monogenic primary’ (N= 28); 
and ‘Monogenic OMIM’ (N= 13). We restricted these analyses to singleton and 
ultra-rare (MAF < 0.1%) protein-truncating variants. 

We then used biological knowledge to test for enrichment of association 
signal across established sets of genes from Gene Ontology, KEGG, Reactome, 
and Biocarta collections from MSigDB (version 4.0) as well as a number of hand- 
curated gene-sets (Supplementary Table 32) that had been generated for the SMP 
analyses. These analyses calculated measures of gene-set enrichment from gene- 
level association results (that is, from SKAT-O) by means of a pre-ranked GSEA™ 
method (version 2.0.13), which consists of a weighted Kolmogorov-Smirnov 
(random bridge) statistic. In our analysis we performed 10,000 permutations on 
gene-set sizes from 5 to 5,000 genes. 

Investigation of genes implicated in Mendelian forms of diabetes in the exome 
data. We first curated a list of 81 genes termed the ‘Monogenic all’ gene-set 
(Supplementary Table 22), consisting of genes with pathogenic mutations reported 
to co-segregate with diabetes or a syndrome associated with an increased preva- 
lence of diabetes. Two subsets of the ‘Monogenic all’ gene-set were then addition- 
ally defined: the ‘Monogenic primary’ gene-set (N= 28), consisting of genes with 
mutations leading to diabetes as a primary feature, and the ‘Monogenic OMIM’ 
gene-set (N= 13), consisting of genes linked to MODY or neonatal diabetes in 
the OMIM catalogue (entries 606391 and 606176). In addition to examining the 
significance of single-variant and gene-based tests within these gene-sets, we 
also performed an aggregate analysis of all variants in the gene-set. For each of 
the three gene-sets, we constructed five variant lists by applying the same four 
masks as in the exome-wide gene-level analysis (PT V-only, PTV + missense, 
PTV +NSproad and PTV + NSstrict), a8 ell as an additional mask containing all 
variants reported as ‘high confidence’ and ‘disease-causing’ in the Human Gene 
Mutation Database (HGMD), annotated using Biobase ‘GenomeTrax’ software 
(http://www.biobase-international.com/product/genome-trax). We then analysed 
each of the fifteen variant lists with the SKAT-O test, using the same meta-analysis 
procedure and covariates as in the exome-wide gene-based analysis. To obtain 
effect-size estimates, for each variant list we applied a collapsing burden test, in 
which logistic regression of T2D status was performed on individual genotypes 
encoded as 0 (if they carried no variants in the list) or 1 (if they carried at least one 
variant in the list). Effect size estimates and standard errors were determined using 
the Firth penalized likelihood method. Analysis in the exome array data set was 
performed by first generating fifteen variant lists based on the content of the exome 
array, computing the collapsing burden test for each cohort, and then combining 
associations and effect size estimates using an inverse variance weighted meta- 
analysis. To compare the age of diagnosis of variant carriers to those of non-carriers, 
we used a two-sided t-test. 

Protein-protein interaction analyses in the exome data. We performed data- 
driven extraction of association signal enriched sub-networks (rather than relying 
on pre-defined gene-sets) from protein-protein interaction (PPI) data. We used 
two different approaches, both run using the curated PPI database InWeb3 (ref. 83). 

The first approach consists of two steps. First, the entire human PPI network 
was searched for protein complexes (clusters) using the algorithm implemented 
in clusterONE™, which identifies protein complexes with high cohesiveness. The 
method was run with default parameter settings (0.3 as density threshold, 0.8 as 
merging threshold, and 2 as the penalty-value node), and with the-fluff option 
activated, which allows the addition of highly connected boundary nodes to the 
cluster. Second, gene-based association P values derived from SKAT-O analyses of 
the 12,940 multiethnic exome sequences were aggregated, using Fisher’s method, 
for the genes encoding each of the proteins within a cluster to generate a ‘cluster 
association statistic. 

An empirical P value for the significance of these aggregated cluster association 
statistics was derived by comparing each cluster to a large number of complexes 
of the same topology, but composed of randomly sampled proteins. Specifically, 


© 2016 Macmillan Publishers Limited. All rights reserved 


a background distribution was obtained for each protein complex as follows: each 
protein in the cluster was randomly substituted by a different protein represented 
in the InWeb3 database, matched for number of minor allele carriers in the data set. 
SKAT-O P values were assigned to each protein from the exome sequencing results, 
and an aggregated P value was obtained for each pseudo-complex using Fisher's 
method, as above. This process was repeated 100,000 times, and the empirical 
P value for each complex was calculated as the proportion of the iterations for 
which the Fisher's P value of the observed complex was more significant than that of 
P values for the pseudo-complexes. This procedure was repeated for all gene-level 
masks (PTV-only, PTV + missense, PTV + NSctrict and PTV + NSproaa)- 

To test the study-wide significance of apparently associated clusters, we used 
two permutation designs. In the first design, we generated 100,000 pseudo- 
complexes for each cluster, replacing each protein within each cluster with one 
protein from InWeb3, matched for the number of minor allele carriers in the data 
set. We calculated the number of permuted data sets which generated any ‘pseudo- 
cluster’ association P value more significant than our most enriched cluster. In the 
second design, we used a Monte-Carlo algorithm to generate 10,000 random PPI 
networks, with the same degree as observed in the InWeb3 database, ran cluster- 
ONE on each, and once again compared the distribution of ‘best’ cluster association 
P value with that observed in the real data. 

The second approach uses the dense module searching algorithm (a heuristic 
‘greedy’ method) described in dmGWaAS*, where a module is defined as a sub-network 
within the whole network if it contains a locally increased proportion of low P value 
genes. This method differs from the earlier method in using the association 
P values, in combination with the PPI data, to construct the networks. The module 
is grown for each protein in the PPI by adding the neighbouring nodes within a 
pre-defined distance (d= 2) that can yield a maximum increment of the module 


score Zh) = 3 Z;/ Jk for module m, where k is the number of genes in the module 


and Z; is calculated from the P value of exome gene-based tests using an inverse 
normal distribution function. The addition of neighbourhood nodes is stopped 


when the increment is less than 10% of Z\) (that is, Zier) < Zh) + 2) x 0.1). As 


with the clusterONE approach, this procedure was conducted for all four exome 
gene-based level masks. 

To evaluate whether the top ranked-modules are significantly associated with 
T2D, we permuted case-control status across the 12,940 exomes (maintaining ethnic 
strata) 10,000 times and generated 10,000 SKAT-O gene-based association tests on 
all genes in the top 15 modules (once for each gene-based variant mask, 40,000 in 
total). During each permutation, Z,,, was re-calculated for each module, and a set 
of empirical P values was obtained by comparing the P value of the original module 
to these modules with the SKAT-O results from the swapped labels. Following 
the above procedure, all 15 top modules were found significantly enriched for 
the PTV + NSgtrict and PTV + NSproaa gene-based variant masks (P< 10~‘, after 
the 10,000 case-control permutations). 

Modelling disease architecture 

T2D liability risk and architecture bounding in the exome array data. We used 
a Bayesian framework implemented in R to compute the probability that each var- 
iant explains more than a defined amount of the T2D-risk liability-scale variance 
(LVE). The joint distribution in the MAF-OR space is computed by assuming a 
T2D prevalence of 8% and beta and normal distributions for the MAF and OR, 
respectively. The OR is calculated with reference to the minor allele. The MAF 
is adjusted to take account of apparent allele frequency heterogeneity between 
cohorts (subjects from missing cohorts are excluded from calculations). Analyses 
are restricted to variants with MAF >0.1% because the representation of variants 
with MAF below this threshold on the exome array is poor. The probability is 
obtained by numerically integrating over the joint distribution for MAF-OR com- 
binations that explain more than the defined amount of liability-scale variance. 
For bounding the maximum number of variants that could contribute to T2D risk 
variance, we performed a sensitivity analysis on the 88 known T2D index SNVs 
present on the exome array to define the thresholded variance explained and the 
probability: this analysis shows that for a probability of >0.8 to explain 0.01% of the 
T2D risk variance, we were able to identify 91% of these known T2D SNVs. Ranges 
of OR and MAF consistent with 80% power to detect single-variant association in 
this data set (for exome-wide significance, P<5 x 10-7) were calculated to reflect 
the fact that differences in sample size for individual variants (due to differences 
in allele frequency distribution and genotyping QC) also influence power. The 
relationship between power and LVE differs for risk and protective alleles because 
of unequal numbers of cases and controls. 

Genetic architecture simulations based on GoT2D data and results 

Range of simulated disease models. Following our previously published frame- 
work”, we conducted population genetic simulations of T2D architecture using 
the forward simulation program ForSim*®. We assumed T2D prevalence 8% and 
heritability ~45%, and chose the mutation rate, recombination rate, a gamma 
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distribution of selection coefficients, and other parameters of demographic history 
by fitting the simulated site frequency spectrum to empirical high-coverage exome 
sequence data from GoT2D. 

We then considered a wide range of disease models by varying two param- 
eters: coupling parameter 7, which regulates how strongly selection against a 
disease-causing allele depends on the per-allele disease risk®’; and target size T, 
the summed lengths of the genomic regions within which mutations can influence 
T2D risk. Specifically, a variant’s additive contribution to disease risk gis given by 
g=st(1+e) where s is the selection coefficient under which the variant evolves 
and ¢ is drawn from a normal distribution®”. 

By varying 7 and T, we generated a wide range of joint distributions for allele 
frequency and effect size. In total, we evaluated 12 models: 7=0, 0.1, 0.3, and 0.5 
crossed with T= 750 kb, 2.0 Mb, and 3.75 Mb. Under models with higher selec- 
tion against strongly deleterious alleles (larger 7), rare variants explain the bulk of 
heritability and can have large effects, while under models with weak dependence 
(smaller 7), common variants explain the bulk of heritability and rare variants 
collectively have weaker effects. Although we had previously excluded many 
models as producing predictions inconsistent with observed sibling relative risk, 
GWAS, and linkage results, prior work showed that models varying widely in the 
proportion of total heritability attributable to rare versus common variation were 
still plausible®®. In this study, we explored whether the space of plausible disease 
models could be further constrained using whole genome sequence, imputation, 
and meta-analysis results. 

Simulation procedure ForSim enables simulation of variants across user-speci- 
fied loci in large populations. Inputs include a demographic history (trained on 
European sequence data) and a gamma distribution of selection coefficients for a 
subset of variants under natural selection. We simulated genotypes for a current 
population of effective size 500,000 individuals*® and selected potential disease 
risk variants from those under selection appropriate to the intended target size. 
Each risk variant received a disease-specific effect size depending on the selection 
coefficient under which it evolved and the assumed degree of dependence between 
selection and effect size. Each individual was then designated as case or control 
depending on his/her cumulative genetic risk score plus a random environmental 
risk component chosen to achieve the estimated T2D heritability of ~45%. From 
this population simulated with both phenotypes and genotypes, we selected appro- 
priate numbers of cases and controls and conducted single-variant association tests 
in order to compare the distribution of P values from simulation to that observed 
in the current study. Results shown are the average of 25 independent simulation 
replicates for each disease model. 

Comparison of simulated outcomes to empirical T2D results. We focused on compar- 
ing simulated outcomes under three disease models, each of which were previously 
found to be consistent with sibling relative risk, GWAS, and linkage results for 
T2D, but vary widely in causal variant properties (Fig. 3): a rare-variant model in 
which rare variants explain ~75% of T2D heritability (small target size T=750 kb 
and moderate dependence between effect size and selection T= 0.5), an interme- 
diate model in which rare, low-frequency, and common variants all contribute 
significantly to T2D heritability (T= 2.0 Mb and 7=0.3), and a common polygenic 
model in which common variants explain ~75% of T2D heritability (T= 3.75 Mb 
and weak dependence T= 0.1). We first compared the simulated outcomes of a 
whole-genome sequencing study in ~3,000 samples under each model. All three 
models predicted similar distributions of variant association test statistics using 
the sequenced individuals alone (data not shown). 

However, the predictions began to diverge when we simulated imputation into 
GWAS samples and studied the distribution of test statistics after meta-analysis. 
For each simulated model, we sampled 14,175 cases and 14,175 controls (to match 
the effective sample size of the actual imputation cohorts used for meta-analysis). 
Because genotyping accuracy in simulated samples is perfect (unlike in impu- 
tation), we calculated average imputation quality as a function of MAC in the 
empirical data (using the 7? value reported by the imputation software that was 
used in each cohort). We then corrected, for each variant, the association test 
statistic in simulated data by multiplying the y? value by the average imputation 
’° for the variant MAC. We then re-computed association P values from the cor- 
rected chi-squared statistics to compare P value distributions in simulated versus 
empirical data. We plotted the distribution of association P values for variants of 
different frequency classes in a quantile-quantile (QQ) plot, and compared these 
curves to the empirical T2D results (Fig. 3). Focusing on low-frequency variants, 
we also asked how many unique low-frequency signals achieved significant asso- 
ciation to T2D risk under each simulated model, and compared these quantities 
to empirical observation (Fig. 3). These analyses demonstrate that the interme- 
diate and rare-variant models produce an excess of association signal among 
low-frequency variants compared to observation, whereas the common poly- 
genic model is consistent with the genome-wide distribution of association signals 
observed. 
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Extended Data Figure 2 | See next page for caption. 
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Extended Data Figure 2 | Power for single and aggregate variant 
association. a-g, Power to detect single-variant association (a =5 x 10-8) 
at varying minor allele frequencies (x-axis) and allelic ORs (y-axis) for 
seven effective sample size (Nes) scenarios relevant to the genomes (a-c) 
and exomes (d-g) components of this project. a, Variant observed in 
2,657 samples (the effective size of the GoT2D integrated panel). b, Variant 
observed in 28,350 samples (the effective size of the imputed data set). 

c, Variant observed in the GoT2D integrated panel and the imputed data 
set (effective sample size 31,007). d, Ancestry-specific variant in 2,000 
samples (the size of each of the non-European exome sequence data sets). 
e, European-specific variant in 5,000 samples (the combined size of the 
European exome sequence data sets). f, Variant observed with shared 
frequency across all ancestry groups in 12,940 samples (the size of the 


ARTICLE 


combined exome sequence data set). g, Variant observed in the combined 
exome array and sequencing data set (effective sample size 82,758). 

h, i, Power for gene-based test of association (SKAT-O) according to 
liability variance explained. In h, 50% of the variants contribute to disease 
risk and the remaining 50% have no effect on disease risk; in i, 100% of 
the variants contribute to disease risk. For each, sample sizes considered 
are 2,000 (ancestry-specific effects; green) and 12,940 (ancestry-shared 
effects; blue). Power is shown for two levels of significance (a =2.5 x 10° 
and a=0.001). From these simulation studies, it is clear that under the 
optimistic model, where effects are shared across all ethnicities (blue line) 
and all variants contribute, power is >60% for 1% variance explained 

and a=2.5 x 10~°. However, power declines rapidly if either criterion 

is relaxed. 
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Extended Data Figure 3 | Single variant analyses. a~c, Manhattan plot reported lead variant from GWAS region. Loci achieving genome-wide 
of single-variant analyses generated from exome sequence data in 6,504 significance only in the combined analysis are highlighted in bold. 
cases and 6,436 controls of African American, East Asian, European, The HNFIA variant reaching genome-wide significance in the combined 
Hispanic, and South Asian ancestry (a); exome array genotypes in 28,305 analysis is a synonymous variant (Thr515Thr). The dashed horizontal 
cases and 51,549 controls of European ancestry (b); and combined line in each panel designates the threshold for genome-wide significance 
meta-analysis of exome array and exome sequence samples (c). Coding (P<5x 107%). 
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Extended Data Figure 4 | Classification of coding variants according 
to their relationship to reported lead variants for each GWAS region. 
The ideogram shows the location of 25 coding variant associations at 

16 loci described in the text. The number in each circle corresponds to 
the number of associated variants at each locus. Variants are grouped 
into five categories based on inferred relationship with the GWAS lead 
variant. For some of these categories, the figure includes representative 
regional association plots based on exome array meta-analysis data from 
28,305 cases and 51,549 controls. The locus displayed for each category is 
designated in bold. The first plot in each panel shows the unconditional 
association results; the middle plot the association results after 
conditioning on the non-coding GWAS SNP; and the last plot the results 
after conditioning on the most significantly associated coding variant. 


Each point represents an SNP in the exome array meta-analysis, plotted 
with its P value (on a -logg scale) as a function of the genomic position 
(hg19). In each panel, the lead coding variant is represented by the purple 
symbol. The colour-coding of all other SNPs indicates LD with the lead 
SNP (estimated by European r? from 1000G March 2012 reference panel: 
red r > 0.8; gold 0.6 <1? < 0.8; green 0.4 <r? < 0.6; cyan 0.2<7? < 0.4; 
blue 7? < 0.2; grey 7? unknown). Gene annotations are taken from the 
University of California Santa Cruz genome browser. GWS: genome-wide 
significance. *Seven variants, three at ASCC2, and one each at THADA, 
TSPANS8, FES and HNF4A did not achieve genome-wide significance 
themselves, but are included because they fall into genes and/or regions 
with other significant association signals (see text). 
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produce synthetic association. b, Example of synthetic association 
exclusion at the TCF7L2 locus. Error bars represent 95% confidence 
intervals for the index SNP odds ratio as rare variants are greedily added to 
the model. c, The size of credible sets at T2D GWAS loci when constructed 
from the GoT2D data, compared to the sizes when restricted to variants in 
the 1000G or HapMap data. 


Extended Data Figure 5 | Exclusion of synthetic associations and 
construction of credible causal variant sets at T2D GWAS loci. Ten T2D 
GWAS loci were selected for synthetic association testing (P< 0.001; 

see Methods). a, The effect size observed at the GWAS index SNV 
(sequence data) before (navy blue) and after (light blue, grey) conditioning 
on candidate rare and low-frequency (MAF <5%) variants which could 
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Extended Data Figure 6 | Genome enrichment analysis in GoT2D whole — (hASC-t4 EnhA, TssA), pancreatic islet active and weak enhancers (HI 


genome sequence data. n = 2,657. a, Functional annotation categories EnhA, EnhWk), pre-adipose active and weak enhancers (hASC-t1 EnhA, 
were defined using transcription, chromatin state and transcription EnhWk), embryonic stem cell active promoters (H1-hESC TssA) and 
factor binding data from GENCODE, ENCODE and other studies. 5/UTRs. Dots represent enrichment estimates and horizontal lines the 

b, T2D association statistics for variants at each T2D locus were jointly 95% confidence intervals. c, At the CCND2 locus, three variants not 
modelled with functional annotation using fgwas. In the resulting model present in HapMap2 have a combined 90% posterior probability of 

we identified enrichment of coding exons (CDS), transcription factor being causal (1s4238013, rs3217801, rs73040004). One of these variants, 
binding sites (TFBS), mature adipose active enhancers and promoters rs3217801, is a 2-bp indel that overlaps an islet enhancer element. 
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Extended Data Figure 7 | Low frequency variants in exome array data. 
Results from meta-analysis of 43,045 low-frequency and common coding 
variants on the exome array (assayed in 79,854 European subjects). 

a, Observed allelic ORs as a property of allele MAF. Variants missing in 
more than eight cohorts or polymorphic in only one cohort were excluded. 
Coloured lines represent contours for liability variance explained. Regions 
shaded grey denote ranges of OR and MAF consistent with 80% power 

(in this case, at ~=5 x 10”) to detect single-variant associations in this 


data set (given the observed range of missing data). Variants with a black 
collar are those highlighted by a bounding analysis as having a probability 
>0.8 of having liability-scale variance (LVE) > 0.1%. b, Distribution of each 
variant in the MAF/OR space was computed by assuming T2D prevalence 
of 8% and a beta and normal distribution for MAF and OR, respectively. 
Probability is obtained by integrating the joint MAF-OR distributions over 
ranges of LVE. c, Single variant association, liability and bounding results 
for the known T2D GWAS variants on the exome array (see Methods). 
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Extended Data Table 1 | Summary information for sample sets used in the association analyses 


Num. of Num. of Effective 
Ancestry Study Countries of Origin Cases Controls Sample 
(% female) (% female) Size 
Whole Genome Sequencing Studies 
European Finland-United States Investigation of NIDDM Genetics (FUSION) Study Finland 493 (41.5) 486 (45.2) 979 
European Kooperative Gesundheitsforschung in der Region Augsburg (KORA) Germany 101 (44.5) 104 (66.3) 205 
European Malmo-Botnia Study Finland, Sweden 410 (51.5) 419 (44.1) 829 
European UK Type 2 Diabetes Genetics Consortium (UKT2D) UK 322 (46.2) 322 (82.2) 644 
Total Whole Genome Sequence 1,326 1,331 2,657 
Genome-Wide Array Studies 
European INTERACT Re i lg ae ux 4624 (51.8) 4668 (64.2) 9292 
European Wellcome Trust Case Control Consortium (WTCCC) UK 1586 (40.9) 2938 (50.8) 4120 
European Kooperative Gesundheitsforschung in der Region Augsburg (KORA) Germany 993 (45.1) 2985 (52.2) 2980 
European Framingham Heart Study (FHS) US 673 (42.6) 7660 (55.1) 2475 
European Finland-United States Investigation of NIDDM Genetics (FUSION) Study Finland 1060 (43.1) 1090 (51.3) 2150 
European Diabetes Genetics Initiative (DGI) Finland, Sweden 899 (46.6) 1057 (49.6) 1943 
European Estonian Genome Center, University of Tartu (EGCUT-OMNI) Estonia 389 (58.6) 6013 (54.2) 1461 
European Diabetes Gene Discovery Group (DGDG) France, Canada 677 (39.3) 697 (59.7) 1374 
European Mt Sinai BioMe Biobank Platform (BioMe (Illumina)) US 255 (29.0) 1647 (51.4) 883 
European Uppsala Longitudinal Study of Adult Men (ULSAM) Sweden 166 (0) 953 (0) 565 
European Mt Sinai BioMe Biobank Platform (BioMe) US 132 (26.5) 455 (34.7) 409 
European Prospective Investigation of the Vasculature in Uppsala Seniors (PIVUS) Sweden 111 (41.4) 838 (51.2) 392 
European Estonian Genome Center, University of Tartu (EGCUT-370) Estonia 80 (48.8) 1768 (51) 306 
Total Genome-Wide Array 11,645 32,769 28,350 
Total Whole Genome Sequence + Genome-Wide Array 12,971 34,100 31,007 
Whole Exome Sequencing Studies 
African American Jackson Heart Study US 500 (66.6) 526 (63.3) 1,026 
African American Wake Forest School of Medicine Study US 518 (59.5) 530 (56.0) 1,048 
East Asian Korea Association Research Project Korea 526 (45.6) 561 (58.5) 1,086 
East Asian Singapore Diabetes Cohort Study; Singapore Prospective Study Program Singapore (Chinese) 486 (52.1) 592 (61.3) 1,068 
European Ashkenazi US, Israel 506 (47.0) 355 (56.9) 834 
European Metabolic Syndrome in Men Study (METSIM) Finland 484 (0) 498 (0) 982 
European Finland-United States Investigation of NIDDM Genetics (FUSION) Study Finland 472 (42.6) 476 (45.0) 948 
European Kooperative Gesundheitsforschung in der Region Augsburg (KORA) Germany 97 (44.3) 90 (63.3) 186 
European UK Type 2 Diabetes Genetics Consortium (UKT2D) UK 322 (45.7) 320 (82.8) 642 
European Malmo-Botnia Study Finland, Sweden 478 (54.8) 443 (43.8) 920 
San Antonio Family Heart Study, San Antonio Family Diabetes/ 
7 . Gallbladder Study, Veterans Administration Genetic Epidemiology Study, 
Hispanic and the investigation of Nephropathy and Diabetes Study Family - US 272 (58.8) 218 (58.7) 484 
Component 
Hispanic Starr County, Texas US 749 (59.7) 704 (71.9) 1,452 
South Asian London Life Sciences Population Study (LOLIPOP) UK (Indian Asian) 531 (14.1) 538 (15.8) 1,068 
South Asian Singapore Indian Eye Study Singapore (Indian Asian) 563 (44.4) 585 (49.2) 1,148 
Total Whole Exome Sequence 6,504 6,436 12,892 
Exome Array Studies 
European ADDITION; Steno as Havel Health06; Health08; Vejle Denmark 5813 (40.0) 7987 (54.4) 13,458 
Wellcome Trust Case Control Consortium (UK Type 2 Diabetes 
Consortium); Young Diabetics Study (YDX); Genetics of Diabetes and 
European Audit peti Tayside Study reste: Oxford Biobank; TwinsUK; UK S576(61.7) 12675,(4:122) 11,156 
1958 Birth Cohort (BC58) 
Finland-United States Investigation of NIDDM Genetics (FUSION) Study; 
European Finrisk2007; Metabolic Syndrome in Men Study (METSIM); Dose- Finland 3593 (33.4) 8222 (26.0) 10,001 
Responses to Exercise Training (DR’sEXTRA); D2D2007 
European Malmo Diabetes Cohort (MDC); All New Diabetics in Skane (ANDIS) Sweden 4633(41.0) 5404 (59.5) 9,978 
European Prevalence, aig for Pe re cand (PPP); Diabetes Finland 2910 (43.7) 4596 (53.7) 7,127 
European Nurses’ Health Study (NHS) US 1413 (100.0) 1695 (100.0) 3,082 
European Health Professionals Follow-up Study (HPFS) US 1184 (0.0) 1287 (0.0) 2,467 
European The Exeter Family Study of Child Health (EFSOCH) UK 1446 (39.0) 1567 (52.0) 3,008 
European Kooperative Gesundheitsforschung in der Region Augsburg (KORA) Germany 933 (45.3) 2705 (51.7) 2,775 
European Estonian Genome Center at the University of Tartu (EGCUT) Estonia 882 (43.7) 1506 (44.2) 2,225 
European Gene-Lifestyle aT Ran Ccleny Involved in Elevated Sweden 960 (47.6) 957 (54.5) 1,917 
European Fenland cohort of the European Te Investigation of Cancer (Fen- UK 691(47.0) 1157 (54.5) 1,730 
The Prospective Investigation of the Vasculature in Uppsala Seniors 
European (PIVUS). Uppsala Lontclinal Study of Adult Men (ULSAM) Silt SIAIGO) AIST (289) 942 
Total Exome Array 28,305 51,549 69,866 
Total Whole Exome Sequence + Exome Array 34,809 57,985 82,758 
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Extended Data Table 2 | Counts and properties of variants identified in sequenced subjects 


a 


Genomes integrated panel 


SNV Indel SV 
Variant T 
*arlant Type 25.2M (94%) 1.50M (5.6%) 8,876 (0.03%) 
N (%total 
Codin Non-codini 
Function 


888K (3.3%) 25.8M (97%) 
N (%total 


Rare (MAF<0.5%) Low frequency (0.5<MAF<5%) Common (MAF>5% 


Frequency spectrum 6.26M (23%) 4.16M (16%) 16.3M (61%) 


N (%total 

b137 Novel 
dbSNV 14.6M (55%) 12.1M (45%) 
N (%total 


Exome sequence data 
Allsamples African-American East-Asian European Hispanic South-Asian 


Samples: 13,008 2,086 2,165 4,579 1,959 2,219 
T2D cases 6,504 1,018 1,012 2,359 1,021 1,094 
T2D controls 6,436 1,056 1,153 2,182 922 1,123 

Excluded from association analysis 68 12 0 38 16 2 

Coverage: 

Coding: 

Mean (Mc) per gene 81.7 +23.7 83.2 +24.0 84.6 +23.8 78.6 +23.3 83.8 +24.1 78.2 +23.2 
# of genes with Mc<20 368 302 302 351 269 325 
Non-coding: 

Mean per gene 59.0 +21.0 60.9 +21.5 62.2 +21.6 57.5 +20.6 §9.2 +21.2 55.4 +20.3 
# of genes with Mc<20 1,150 738 731 1,102 804 945 

Variant annotations: 

Synonymous SNV 627,630 237,430 178,232 192,282 156,231 211,218 
Missense SNV 1,110,897 354,797 296,707 327,049 231,351 344,191 
Start SNV 2,055 593 523 639 384 583 
Nonsense SNV 26,321 7,188 6,668 8,030 4,660 7,339 
Frameshift INDEL 26,901 6,605 6,159 7,515 4,155 6,609 
Inframe INDEL 11,090 3,471 2,963 3,145 2,068 3,165 
S'UTR SNV, INDEL 65,013 24,583 19,149 21,102 16,959 22,177 
5'UTR SNV, INDEL 43,965 16,920 13,520 15,562 11,634 15,595 
Intron SNV, INDEL 931,449 352,398 270,564 296,970 243,139 314,810 
Essential splicing SNV, INDEL 14,286 3,648 3,454 4,108 2,301 3,744 
Other splicing SNV, INDEL 128,644 45,876 35,413 38,263 30,301 41,122 
Non-coding RNA SNV, INDEL 18,113 7,247 5,996 6,715 5,084 6,706 
Intergenic SNV, INDEL 37,345 14,335 11,498 13,614 10,700 12,937 
All__ 3,043,709 1,075,091 850,846 934,994 718,967 990,196 

Coding frequency spectrum: 

Rare (MAF<0.5%) 95.79% 83.30% 90.06% 89.19% 84.56% 89.89% 
private 77.93% 53.79% 65.47% 51.80% 37.26% 61.55% 
cosmopolitan 0.35% 1.80% 3.02% 1.88% 2.24% 1.73% 

Low frequency (0.5<MAF<5%) 2.57% 10.36% 4.61% 5.52% 8.21% 5.10% 
private 0.17% 1.43% 1.10% 0.26% 0.52% 1.02% 
cosmopolitan 0.60% 1.50% 1.54% 1.94% 2.74% 1.62% 

Common (MAF>5%) 1.65% 6.35% 5.33% 5.29% 7.23% 5.00% 
private 0.09% 0.00% 0.00% 0.00% 0.01% 0.00% 
cosmopolitan 1.50% 4.35% 5.17% 4.97% 6.88% 4.86% 

Intron/UTR frequency spectrum: 

Rare (MAF<0.5%) 94.09% 78.68% 86.91% 86.17% 81.43% 86.68% 
private 74.76% 49.81% 61.36% 45.26% 31.03% 56.96% 
cosmopolitan 0.46% 2.07% 3.98% 2.49% 2.66% 2.19% 

Low frequency (0.5<MAF<5%) 3.52% 12.57% 5.63% 6.51% 9.43% 6.32% 
private 0.25% 1.74% 1.25% 0.29% 0.47% 1.18% 
cosmopolitan 0.80% 1.81% 2.11% 2.53% 3.30% 2.17% 

Common (MAF>5%) 2.39% 8.76% 7.46% 7.32% 9.14% 7.00% 
private 0.15% 0.00% 0.00% 0.01% 0.00% 0.00% 
cosmopolitan 2.17% 5.94% 7.26% 6.93% 8.77% 6.81% 


ARTICLE 


a, Variant numbers for the 2,657 individuals with whole-genome sequence data passing QC and included in the association analysis data set. b, Variant numbers are provided for the 13,008 
individuals passing initial rounds of QC from which further QC defined the 12,940 subjects included in the association analysis data set. Private refers to variants seen in only a single ancestral group; 


cosmopolitan to variants seen in all five major ancestry groups. 
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Extended Data Table 3 | Characterization of variant associations through conditional analysis 
Locus Variant MAF Unconditional and conditional association p-values Interpretation 


Non-coding variant associations characterized in 38,738 samples in GoT2D genome wide imputed meta analysis 


IRS1 1878124264 1s7578326 1s2943640 1s2943641 The association signal rs78124264 and the 
rs78124264 0.022 8.5x10° 2.5x107" 2.5x107" 2.5x10’"  GWAS SNPs at this locus are distinct. Signals 
1S7578326" 0.35 1.2x107 1.1x10° n.d. n.d. are not extinguished in reciprocal conditional 
rs2943640' 0.35 2.5x10"! n.d. 4.5x10"° n.d. analysis. Previous GWAS signals are not 
rs2943641" 0.36 9.0x10"'? nd. n.d. 1.5x10'° mediated through rs78124264, which represents 

a distinct association signal at this locus. 
PPARG 1s79856023 rs1801282 The association signal rs79856023 and the 
rs79856023 0.022 1.2x10* 9.2x10" GWAS SNP at this locus are distinct. Signals are 
rs1801282" 0.13 1.6x10° 1.2x10° not extinguished in reciprocal conditional 


analysis. Previous GWAS signal is not mediated 
through rs79856023, which represents a distinct 
association signal at this locus. 


Coding variant associations characterized in 28,305 cases and 51,549 controls typed on exome array 


PAM Asp563Gly Ser1207Gly Association signals for PAM Asp563Gly and 
(PAM) (PPIP5K2) PPIP5K2 Ser1207Gly are indistinguishable in 
Asp563Gly 0.054 1.7x10” 0.24 reciprocal conditional analysis. Gene biology, as 
Ser1207Gly 0.054 0.30 1.0x10° well as previous reports of additional PAM 


variants associated with T2D in Icelandic 
cohorts, highlights PAM as the probable 
transcript at this locus. 


MTMR3- Asng60Ser Val123lle Asp407His Pro423Ser Association signals for the MTMA3 and ASCC2 
ASCC2 (MTMR3) (ASCC2) (ASCC2) (ASCC2) coding variants are indistinguishable in 
Asn960Ser 0.083 3.2x10° 0.022 0.027 0.022 reciprocal conditional analysis. The MTMR3 
Val123lle 0.083 0.15 2.0x10° 0.066 0.76 Asn960Ser variant has the strongest signal, and 
Asp407His 0.083 0.18 0.99 1.9x10° 0.88 highlights MTMR3 as the most likely effector 
Pro423Ser 0.083 0.18 0.67 0.98 2.0x10° transcript at this locus. 
KCNJ11 Val337Ile Lys23Glu Ala1369Ser Association signals for KCNJ11 Val337\le and 
-ABCC8 (KCNJ11) (KCNJ11) (ABCC8) Lys23Glu and ABCC8 Ala1369Ser are 
Val337Ile 0.40 3.4x10° 0.17 0.049 indistinguishable in reciprocal conditional 
Lys23Glu 0.40 0.48 5.1x10° 0.082 analysis. The relative causal contributions of the 
Ala1369Ser 0.40 0.68 0.84 2.3x10° two genes, making up the two components of 


the sulfonlylurea-responsive potassium channel, 
are indistinguishable on statistical grounds. 


WFS1 Val333lle Asn500Asn Arg611His rs4689388 Association signals for the WFS1 coding variants 
Val333lle 0.30 9.3x10* 0.024 0.00070 0.0030 are indistinguishable from each other and the 
Asn500Asn 0.41 0.0070 2.0x10"" 0.0049 0.027 previously reported non-coding GWAS SNP at 
Arg611His 0.47 0.020 0.62 1.3x10 0.19 this locus in reciprocal conditional analysis. 
rs4689388' 0.43 0.011 0.62 0.024 23x10" WES is the likely effector transcript for the non- 


coding GWAS signal at this locus, although the 
causal variant in the gene is unclear. 


CILP2- Glu167Lys —_—_rs10401969 Association signals for TM6SF2 Glu167Lys and 

TM6SF2 (TM6SF2) the previously reported non-coding GWAS SNP 
Glu167Lys 0.082 1.9x10” 0.52 at this locus are indistinguishable from each 

rs10401969" 0.083 0.62 4.2x10” other in reciprocal conditional analysis. TM6SF2 


is the probable effector transcript for the non- 
coding GWAS signal at this locus, with the effect 
mediated through Glu167Lys. 


GRB14- Asn939Asp __rs13389219 Association signals for COBLL1 Asn939Asp and 

COBLL1 (COBLL1) the previously reported non-coding GWAS SNP 
Asn939Asp_(0.12 4.7x10"" 3.00x10° at this locus are partially correlated. The 
813389219" 0.39 7.0x10° 1.9x10"° association signal for the GWAS signal is not 


entirely extinguished in reciprocal conditional 
analysis. COBLL1 is a candidate effector 
transcript for the GWAS signal at this locus. 


Coding variant associations characterized in 44,414 samples in GoT2D genome wide imputed meta analysis 


THADA Cys1605Tyr rs10203174 Association signals THADA Cys1605Tyr and the 
Cysi605Tyr 0.10 0.00035 0.92 GWAS SNP are partially correlated. The 
rs10203174" 0.10 0.0063 5.7x10° association signal for the GWAS SNP is not 


entirely extinguished in reciprocal conditional 
analysis. THADA is a candidate effector 
transcript for the GWAS signal at this locus. 


RREB1 Asp1171Asn —_rs9502570 The association signals of RREB1 Asp1171Asn 
Asp1171Asn_ 0.11 0.0018 0.0017 and the GWAS SNP at this locus are distinct. 
rs9502570' 0.28 0.0037 0.0042 The association signal is not extinguished in 


reciprocal conditional analysis. Previous GWAS 
signal is not mediated through RREB1 
Asp1171Asn. RREB? Asp1171Asn represents a 


distinct association signal at this locus. 


For each locus, significantly associated SNVs are presented. Unconditional P values are given in italics, and conditional P values are shown for each pair of SNVs (P values are for SNVs in the Variant 
column, with SNVs listed in header included as covariates in association analysis). The /RS1 and PPARG non-coding associations were characterized using exact conditional analysis in 38,738 samples 
from the GoT2D genome-wide imputed meta-analysis. Conditional analysis for coding variant associations was, for most loci, restricted to the exome array genotypes (28,305 cases, 51,549 controls). 
At THADA and RREB1, neither the non-coding lead GWAS SNVs nor close proxies were typed on the exome array, so approximate conditional analyses were undertaken using GCTA in 44,414 samples 
from the GoT2D genome-wide imputed meta-analysis (see Methods). For several of these loci, unconditional association P values for these loci do not reach genome-wide significance as sample sizes 
are smaller. At the GPSM1 locus, the previously reported GWAS SNV was not available on exome array and too poorly imputed in the GoT2D meta-analysis to allow meaningful inference. 

*Conditional analysis was performed once for rs78124264 with all three previously known GWAS variants included as covariates. 

TNon-coding GWAS lead variant. n.d., not determined. 
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Extended Data Table 4 | Testing for synthetic associations across GWAS-identified T2D loci 


Index SNV association Synthetic association by missense Synthetic association by all low- 
signal variants frequency and rare variants across 5Mb region 
Index SNV Index SNV Index SNV Testing 
association association association groups of low- 
signal before signal after after inclusion of frequency 
inclusion of inclusion of single and rare 
missense variants missense variants best variant variants 
Relative 
OR Number OR likelihood OR 
[95% Missense [95% of LF Best LF [95% 
Gene Index SNV__MAF interval] p-value Variants interval] value model Variant — MAF interval] p-value m ne 
1.75 1.73 1.72 
TCF7L2 10:114758349 0.27 [1.54- 2.80x10%8 6 [1.52- 2.33x10' 1.8x10'” 10:114787948 1.6% [1.51- 1.62x10°° >50 35 
1.99] 1.97] 1.95] 
0.69 0.70 0.71 
ADCY5 3:123065778 0.19 [0.60- 1.12x107” 13 [0.61- 9.00x10% 9.7x10° 3:123096056 2.5% [0.61- 3.04x10° 13 6 
0.79] 0.81] 0.82] 
0.76 0.77 0.78 
IRS1 2:227093745 0.36 [0.68- 2.80x10° 5 [0.69- 4.30x10% 4.5x107 2:226993370 1.7% [0.70- 2.19x10° 12 6 
0.86] 0.86] 0.88] 
0.78 0.84 0.81 
KCNQ1 11:2847069 0.45 [0.70- 1.22x10°  >50 [0.75- 2.07x10° 1.0x107 11:2825279 4.7% [0.71- 3.19x10* 16 6 
0.87] 0.94] 0.91] 
CDC123- 1.33 1.30 1.29 
CAMK1D 10:12307894 0.25 [1.17- 1.19x10° 4 [1.13-  2.06x10* 7.1x10° 10:12325477 3.8% [1.12- 3.03x10* 10 5 
1.52] 1.50] 1.48] 
CDKN2A- 1.28 1.27 1.25 
9:22137685 0.28 [1.14- 4.52x10° 4 [1.13- 9.28x10° 4.3x10°  9:22133773 3.5% [1.10- 5.98x10* 22 7 
CDKN2B 
1.45] 1.43] 1.41] 
1.25 1.21 1.20 
IGF2BP2 3:185511687 0.32 [1.11- 1.65x10* 14 [1.07- 2.12x10° 3.0x10* 3:185550500 4.1% [1.07- 2.91x10° 8 3 
1.41] 1.36] 1.36] 
0.76 0.77 0.80 
KLHDC5 12:27965150 0.17 [0.66- 2.19x10% 3 [0.66- 4.45x10* 1.2x10° 12:27832062 2.0% [0.68- 3.04x10° 10 4 
0.88] 0.89] 0.92] 
0.81 0.81 0.83 
SLC30A8 8:118184783 0.33 [0.72- 2.9510" 2 [0.72- 3.73x10" 0.02 8:117964024 2.2% [0.73- 1.23x10° 17 6 
0.91] 0.91] 0.93] 
1.28 1.28 Dee 
CDKAL1 6:20694884 0.18 [1.11- 6.05x10° 1 [1.11- 7.57x10* 0.007 6:20718780 2.8% [1.06- 7.71x10° 9 3 
1.48] 1.48] 1.43] 


Gene names refer to protein-coding transcript(s) closest to the index SNV. Reported index SNVs are the previously reported GWAS variants (in European populations) with the strongest association 
signal in the GoT2D sequencing data (n= 2,657). Relative likelihoods are based on causal models with only the chosen low-frequency and rare missense variants, relative to models with only the 
GWAS index SNV, assessed using the Akaike Information content (AIC) of each regression model, calculated as exp[(AlCindex—AlClow-frequency or rare)/2]. ny, number of low-frequency or rare 
variants required for the residual odds ratio at the GWAS index SNYV, after joint conditioning on the low-frequency and rare variants, to switch direction of effect. n2, number of low-frequency or rare 
variants required for the association P value remaining at the GWAS index SNV, after joint conditioning on the low-frequency and rare variants, to exceed 0.05. 


© 2016 Macmillan Publishers Limited. All rights reserved 


ARTICLE 


doi:10.1038/nature18938 


Structural basis of potent Zika-dengue 
virus antibody cross- neutralization 


Giovanna Barba-Spaeth!*, Wanwisa Dejnirattisai**, Alexander Rouvinski!?*, Marie-Christine Vaney!*, Iris Medits*, 
Arvind Sharma!?, Etienne Simon-Loriére®°, Anavaj Sakuntabhai>*°, Van-Mai Cao-Lormeau’, Anmed Haouz®”, 
Patrick England®”, Karin Stiasny*, Juthathip Mongkolsapaya*", Franz X. Heinz’, Gavin R. Screaton? & Félix A. Rey)? 


Zika virus is a member of the Flavivirus genus that had not been associated with severe disease in humans until the recent 
outbreaks, when it was linked to microcephaly in newborns in Brazil and to Guillain-Barré syndrome in adults in French 
Polynesia. Zika virus is related to dengue virus, and here we report that a subset of antibodies targeting a conformational 
epitope isolated from patients with dengue virus also potently neutralize Zika virus. The crystal structure of two of these 
antibodies in complex with the envelope protein of Zika virus reveals the details of a conserved epitope, which is also the 
site of interaction of the envelope protein dimer with the precursor membrane (prM) protein during virus maturation. 
Comparison of the Zika and dengue virus immunocomplexes provides a lead for rational, epitope-focused design of a 
universal vaccine capable of eliciting potent cross-neutralizing antibodies to protect simultaneously against both Zika 


and dengue virus infections. 


Zika virus (ZIKV) is an arthropod-borne enveloped virus belonging 
to the Flavivirus genus in the family Flaviviridae, which also includes 
the human pathogenic yellow fever, dengue, West Nile and tick-borne 
encephalitis viruses’. Flaviviruses have two structural glycoproteins, 
prM and E (for precursor membrane and envelope proteins, respec- 
tively), which form a heterodimer in the endoplasmic reticulum (ER) 
of the infected cell and drive the budding of spiky immature virions 
into the ER lumen. These particles transit through the cellular secretory 
pathway, during which the trans-Golgi-resident protease furin cleaves 
prM. This processing is required for infectivity, and results in the loss of 
a large fragment of prM and reorganization of E on the virion surface. 
The mature particles have a smooth aspect, with 90 E dimers organized 
with icosahedral symmetry ina ‘herringbone’ pattern?”. 

Three-dimensional cryo-electron microscopy (cryo-EM) struc- 
tures of the mature ZIKV particles have recently been reported to near 
atomic resolution (3.8 A)*°, showing that the virus has essentially the 
same organization as the other flaviviruses of known structure, such as 
dengue virus (DENV)? and West Nile virus®’. The E protein is about 
500 amino acids long, with the 400 N-terminal residues forming the 
ectodomain essentially folded as B-sheets with three domains, named I, 
II and III, aligned in a row with domain I at the centre. The conserved 
fusion loop is at the distal end of the rod in domain I, buried at the 
E dimer interface. At the C terminus, the E ectodomain is followed 
by the ‘stem; featuring two a-helices lying flat on the viral membrane 
(the stem helices), which link to two C-terminal transmembrane 
a-helices. The main distinguishing feature of the ZIKV virion is an 
insertion within a glycosylated loop of E (the ‘150’ loop), which pro- 
trudes from the mature virion surface*». 

Flaviviruses have been grouped into serocomplexes based on 
cross-neutralization studies with polyclonal immune sera®. The 
E protein is the main target of neutralizing antibodies, and is also the 


viral fusogen; cleavage of prM allows E to respond to the endosomal 
pH by undergoing a large-scale conformational change that catalyses 
membrane fusion and releases the viral genome into the cyotosol. Loss 
of the precursor fragment of prM lets the E protein fluctuate from its 
tight packing at the surface of the virion, transiently exposing otherwise 
buried surfaces. One surface exposed by this ‘breathing’ is the fusion- 
loop epitope (FLE), which is a dominant cross-reactive antigenic site’. 
Although antibodies to this site can protect by complement-mediated 
mechanisms, as shown in a mouse model for West Nile virus!®, they are 
poorly neutralizing and lead to antibody-dependent enhancement!"!, 
thereby aggravating Flavivirus pathogenesis and complicating the 
development of safe and effective vaccines. 

We recently reported the functional and structural characterization 
of a panel of antibodies isolated from patients with dengue disease!" 
Most of these antibodies target the FLE, but others target a quaternary 
site readily accessible at the exposed surface of the E protein on the 
virion, at the interface between the two E subunits in the dimer. These 
broadly neutralizing antibodies (bnAbs), termed EDE for E-dimer 
epitope, potently neutralize all four DENV serotypes. Their binding 
site is conserved across serotypes because it is also the interaction site 
of prM with E dimers during transport of the immature virus particles 
through the Golgi apparatus of the cell. There were two subsets of 
EDE antibodies, characterized by a differential requirement for 
glycosylation on the 150 loop for binding. The EDE1 bnAbs bind 
better in the absence of glycan, whereas EDE2 bnAbs bind better when 
the glycan is present. 

In this study, we show that the EDE bnAbs neutralize ZIKV as 
potently as they neutralize DENV. We also find that the FLE antibodies, 
which neutralize DENV although not as potently as the EDE bnAbs, 
do not neutralize ZIKV at concentrations up to 1 .M in spite of a high 
affinity for the recombinant ZIKV E protein. We further describe the 
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Figure 1 | ZIKV and DENV E protein phylogeny and reactivity with 
DENV-elicited antibodies. a, Phylogenetic trees of the main human 
pathogenic flaviviruses based on the amino acid sequences of the E protein 
(left) and of the polymerase NS5 protein (right). The arthropod vectors 
are differentiated by the background colour. JEV, Japanese encephalitis 
virus; MVEV, Murray Valley encephalitis virus; POWV, Powassan virus; 
SLEV, Saint Louis encephalitis virus; TBEV, tick-borne encephalitis virus; 
YFV, yellow fever virus; WNV, West Nile virus. b, ZIKV sE reactivity 

with human recombinant full-length antibodies FLE P6B10, EDE1 C8 

and EDE2 A11. Left, binding properties were monitored by biolayer 
interferometry (BLI) on Octet-Red (ForteBio). The normalized response 
values expressed as fraction of binding site occupancy are plotted against 
concentrations of ZIKV sE dimer shown at logarithmic scale. Lines 

denote global curve fits used for Ky evaluation (see Extended Data Fig. la 
for linear concentration range showing concentration-dependent 
saturation fits). Normalized response values were deduced from individual 
sensorgrams showing the binding properties for EDE1 C8 measured at 
different ZIKV sE concentrations. 


crystal structures of the ZIKV E protein dimer alone and in complex 
with EDE1 C8 and EDE2 A11, identifying their binding determinants. 


A ZIKV-DENV super serogroup 

Phylogenetic analyses of the main human pathogenic flaviviruses using 
the amino acid sequences of the viral RNA polymerase NSS indicate 
a clustering of ZIKV with the group of mosquito-borne encephalitic 
viruses. The clustering is different when the amino acid sequences of 
the E protein are considered, with ZIKV branching with the DENV 
group (Fig. 1a). If the sequence clustering extends to the antigenic 
surface of E, antibodies that cross-react with several DENV serotypes 
should also bind ZIKV E. To test this hypothesis, we used biolayer 
interferometry to study the binding properties of a poorly neutraliz- 
ing, cross-reactive FLE antibody and the potently neutralizing EDE 
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antibodies for recombinant, soluble ZIKV E ectodomain (ZIKV sE) 
produced in insect cells (see Methods). The FLE antibody (P6B10) 
bound nearly 10-fold more tightly than did EDE1 C8 (apparent disso- 
ciation constant (Kq) values of 1.5nM versus 9nM), and nearly 1,000 
times more tightly than EDE2 A11 (Fig. 1b and Extended Data Fig. 1a). 
Consistent with their affinities, we could isolate a complex of ZIKV sE 
with a C8 Fab by size-exclusion chromatography, but not with an Al1 
Fab (Extended Data Fig. 1b). 

Neutralization assays in African green monkey kidney (Vero) cells 
using these and other members of the three antibody subsets, showed 
that the EDE1 antibodies strongly neutralize ZIKV, whereas the EDE2 
antibodies were at least one log order less potent. Despite its strong 
affinity, P6B10 did not neutralize ZIKV in the concentration range 
used, nor did either of the two other FLE antibodies tested (Fig. 2). 
The EDE1 bnAbs neutralized best the ZIKV African strain HD78788, 
with a half-maximum inhibitory concentration (ICs9) of about 0.1nM. 
This strain has over the years been cell-culture-adapted and passaged 
in suckling mice brain and lacks E glycosylation. The ICs» against the 
PF13 strain, isolated in French Polynesia in 2013 and in which the 
E protein is glycosylated at position 154, was in the nanomolar range 
and comparable to, or lower than, that against the four DENV serotypes 
(Table 1). The EDE2 bnAbs showed no difference in neutralization of 
the two strains, suggesting that the presence of the N154 glycan in the 
ZIKV E protein did not enhance the interaction. 


The ZIKV-EDE bnAbs immune complexes 

We crystallized unliganded ZIKV sE and complexes of ZIKV sE with 
EDE1 C8 and EDE2 A11 with single-chain Fv (scFv) and Fab frag- 
ments, respectively (Extended Data Table 1). In the structure of unli- 
ganded ZIKV sE, the 150 loop is ordered, unlike the unglycosylated 150 
loop in the recently determined structure of the protein produced in 
bacteria and refolded in vitro'’. In contrast to our insect-cell-secreted 
protein, which is a dimer (Extended Data Fig. 1b), the refolded protein 
was reported to be monomeric in solution, suggesting that the glycan 
may help to structure the loop and promote sE dimerization. 

The antibodies recognize a quaternary epitope in the ZIKV sE dimer 
in the same way that they recognize the DENV serotype 2 (DENV-2) 
sE dimer described earlier!®. The amino acid residues participating in 
the contacts, for both the ZIKV and DENV-2 structures, are shown in 
Extended Data Fig. 2. As expected, the pattern is very similar, with the 
few differences highlighted in red frames in Extended Data Fig. 2b. 
Both epitopes in the sE dimer are occupied in the case of the complex 
with C8 (Fig. 3a), whereas only one is occupied in the case of All 
(Fig. 4a). Inspection of the crystal environment showed that a second 
Fab could not be docked at this position without clashing with neigh- 
bouring complexes in the crystal. This observation indicates that crystal 
growth selected for incorporation of sE dimers with a single Fab bound, 
which is facilitated by the low affinity of A11. 

The bnAbs dock on ZIKV sE at different angles than they do on 
DENV-2 sE (see insets in Figs 3a and 4a). In the case of the C8 complex, 


Table 1 | 50% FRNT values of EDE1, EDE2 and FLE antibodies tested against ZIKV and DENV-1-4 


ZIKV DENV 
50% FRNT (nM) 50% FRNT (nM) 

Epitope PF13 HD78788 DENV-1 DENV-2 DENV-3 DENV-4 
752-2-C8 EDE1 0.095 (0.026) 0.015 (0.003) 0.39 (£0.21) 0.24 (40.06) 0.64 (£0.08) 1.13 (£0.14) 
753(3) C10 EDE1 0.063 (0.016) 0.013 (0.025) 0.54 (£0.04) 0.18 (+0.02) 1.89 (£0.79) 0.08 (£0.03) 
752-2 B2 EDE1 1.062 (40.362) 0.021 (0.004) 0.32 (£0.05) 0.23 (40.02) 0.22 (+0.09) 0.44 (£0.14) 
747(4) All EDE2 0.904 (0.191) 0.506 (0.102) 0.11 (£0.01) 0.07 (+0.03) 0.11 (£0.02) 7.79 (£3.19) 
747(4) B7 EDE2 4.31 (£1.47) 1.17 (+0.180) 0.10 (£0.01) 0.11 (£0.02) 0.12 (£0.03) 93.19 (£19.15) 
747 C4 EDE2 102 (+25.6) 11.6 (42.6) 0.23 (£0.02) 0.08 (£0.01) 0.11 (£0.01) 0.12 (£0.05) 
758 P6B10 FLE No neut. No neut. 1.85 (40.44) 4.97 (£0.28) 9.40 (£2.83) 7.47 (£1.65) 
749 B12 FLE No neut. No neut. 0.43 (£0.12) 0.73 (£0.20) 1.04 (£0.31) 1.80 (£0.64) 
750-2 C5 FLE No neut. No neut. 1.08 (40.21) 0.76 (£0.46) 1.40 (£0.25) 2.30 (£0.02) 


FRNT, focus reduction neutralization test. Data shown are mean values with s.e.m. in parentheses. 
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Figure 2 | Neutralization curves using three antibodies each from the 
three subsets FLE, EDE1 and EDE2. Results represent the mean + s.e.m. 
of four independent experiments performed in triplicate for PF13 and in 
duplicate for HD78788 strains. The two ZIKV strains are in bright colours, 
blue and red, respectively. The neutralization data for the four DENV 


the difference in docking results mainly from an altered curvature of 
the sE dimer. We note that the conformation of ZIKV sE in complex 
with the antibodies is very similar to the one it adopts on the virus 
particle, with roughly 1.5 A root mean square deviation (r.m.s.d.) for 
790 Ca atoms (see Extended Data Table 2). The unbound ZIKV sE 
crystallized here displays a more distant conformation (2.5 A r.m.s.d. 
when comparing to both virion ZIKV E and either of the sE antibody 
complexes), suggesting that the antibodies stabilize a conformation 
close to that on the viral particle. By contrast, the same comparisons 
done for DENV-2 sE, alone or in complex with the bnAbs, result in 
r.m.s.d. values of 5-7 A with respect to the E conformation on the 
DENV virion observed by cryo-EM’. For comparison, superposition of 
the ectodomain of virion E from ZIKV° and DENV-2 (ref. 3) results in 
a similar 1.5 A r.m.s.d. value, indicating that they are presented roughly 
in the same way, but that DENV sE is more deformable in solution. 
This malleability may reflect the conformational breathing reported 
for DENV E!®. Instead, ZIKV sE remains in a similar conformation in 
the absence of the interactions with the underlying stem a-helices and 
with the M protein (the membrane-anchored remnant of prM after 
furin cleavage) on the virion, in line with the higher stability of the 
ZIKV particles described recently’. 


EDE1 C8 complex 

The total buried surface area of EDE1 C8 in the complex with ZIKV 
sE is about 900 A?, compared to about 1,300 A? in the DENV-2 sE 
complex (Extended Data Table 3). Figure 3d shows the conserva- 
tion of the epitope, and Fig. 3e and f compare the C8 footprint on 
ZIKV and DENV-2 sE. The DENV-specific glycan at position N67, 
which is ordered in the DENV-2 sE structure (Fig. 3c), accounts for 
around two-thirds of the overall difference in footprint area. The 
N67 glycan interacts with the framework region 2 of the heavy chain 
(FRH2), and its absence in ZIKV sE shows that these contacts are 
not essential for binding. The key cluster of interactions that is main- 
tained is centred on 6-strand b of domain II, with side chains from 
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serotypes (dotted lines in pale colours) were taken from ref. 13, and are 
given here for comparison. The corresponding ICs9 values are provided in 
Table 1. Note that the DENV-4 strain used was a natural isolate lacking the 
N153 glycosylation site (N~). 


complementarity determining regions (CDRs) H2, H3 and L3 
recognizing all the available hydrogen bond donors (main-chain 
carbonyls) and acceptors (main-chain carbonyls) of the bdc 3-sheet 
edge (Fig. 3b, c). In addition, the fusion loop main chain (which 
contains several glycine residues) and the disulfide bond between 
Cys74 and Cys105 are framed by aromatic side chains of the CDRs 
L1 and L3 (see also Extended Data Fig. 3). Residues from these two 
CDRs also recognize strictly conserved side chains of the fusion loop 
(Arg99) or nearby residues (Gln77). 

Across the dimer interface, and as in the complex with DENV-2, the 
150 loop is partially disordered, with no detectable density for the N154 
glycan (Fig. 3a and Extended Data Fig. 3d). As shown in Extended Data 
Fig. 3, the interacting residues across the dimer interface are different, 
reflecting the more limited sequence conservation in these regions of the 
E protein: in the DENV-2 sE complex, these contacts are with residues 
from 6-strands A and B of domain III, but in ZIKV they mainly involve 
Lys373 from }-strand E interacting with CDRs L1 and L2, via a network 
of direct or water-mediated hydrogen bonds (Extended Data Fig. 3b, c). 
Similarly, several charged residues in domain I and from the nearby 
kl loop of domain II across the interface, contribute to the binding 
and interact with the heavy chain CDRs H2 and H3 (Extended Data 
Fig. 3e, f). All of the polar interactions between C8 and ZIKV sE are 
listed in Extended Data Tables 4 and 5, and the electrostatic surface 
of the epitope is shown in Extended Data Fig. 4, left panel. In sum- 
mary, these observations identify the conserved cluster of contacts with 
the b strand and the fusion loop in domain II as the main binding 
determinants of C8, with additional contacts from across the dimer 
interface—or from the N67 glycan in DENV—further stabilizing but 
not determining the interaction. 


EDE2 All complex 

Extended Data Fig. 4 compares the footprint of C8 and All on ZIKV 
sE, together with the surface electrostatic potential of the complexes, 
which shows a strong basic patch on sE in the C8 complex due to 
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Figure 3 | EDE1 C8-ZIKV sE complex. a, Overall view of the complex, 
with the sE moiety coloured by domains (I, I] and III in red, yellow and 
blue, respectively); the antibodies in grey and dark green for light and 
heavy chains, respectively. The CDRs are coloured (H1, light blue; H2, 
sand; H3, pink; L1, light grey; L2, magenta; L3, orange). The inset shows 
a comparison with the corresponding DENV-2 complex. For clarity, the 
variable region of the C8 Fab fragment of the DENV-2 sE-C8 complex 
was superposed on the C8 scFv in complex with ZIKV sE to draw the Fab 
axes and show the docking angles better. b, Zoom of the ZIKV sE-C8 
interaction to show the recognition of the b strand. Hydrogen bonds are 
shown as dotted lines and immobilized water molecules as red spheres. 
c, Same region on the DENV-2 sE-C8 Fab complex. Note that the N67 


the disorder of the 150 loop. As shown in Extended Data Fig. 5, C8 
would clash with the glycan had the loop remained in place, as was the 
case in the complex with DENV-2 sE!*. In the All complex, the 150 
loop remains in the same conformation as in the cryo-EM structures 
of the virion (Extended Data Fig. 5a) and in the X-ray structure of 
glycosylated unliganded sE reported here. In the DENV-2 sE-A11 
complex, the glycan is recognized by an a-helix in the long CDR 
H3 loop (Fig. 4e). The difference in length in the 150 loop of E in 
ZIKV compared to DENV shifts the glycan position by about 6-7 A, 
such that it cannot make the same interactions with the CDR H3 
a-helix (Fig. 4d, e and Extended Data Fig. 5b). As a consequence, 
the All antibody docks at a different angle on ZIKV sE than it does 
on DENV-2 sE, even accounting for the difference in sE dimer cur- 
vature (Fig. 4a, inset). The contacts along the b strand are preserved 
(Fig. 4b, c). Compared to C8, the b strand is recognized only along 
half its length (residues 71 and 73), whereas C8 recognizes it all along, 
from residue 68 (or from 67 in DENV). 


glycan on DENV also interacts with the antibody. d, The footprint of EDE1 
C8 is outlined on the ZIKV sE dimer shown in surface representation 
(looking from outside the virion) coloured according to conservation of 
surface-exposed amino acids. Atoms from the main-chain and conserved 
side chains are orange, highly similar side chains are yellow, and all the 
others are white. e, f, Footprints of EDE1 C8 on a surface representation of 
ZIKV sE (e) and DENV-2 sE (f) shown in purple. FL, fusion loop. The two 
protomers of sE in the dimer are in light and dark grey. Relevant antigenic 
sE regions are labelled. Note the more confined interacting surface in the 
ZIKV sE dimer than in DENV-2, for example, N67 glycan is absent in 
ZIKV sE. 


Discussion 

Our results identify the structural details of a quaternary epitope that 
provides a previously unrecognized link of potent cross-neutralization 
between Zika and dengue viruses, and thus identifies an antigenic 
Flavivirus cluster beyond the traditional serocomplexes. This rela- 
tionship defines a super serogroup on the basis of strong cross- 
neutralization through a conserved epitope that had not been recognized 
using polyclonal sera®. This finding thus introduces the possibility of 
developing a universal vaccine protecting against all the viruses from 
this group. 

Vaccine design against dengue virus has been hampered by the het- 
erogeneity of DENV particles and the need to use polyvalent formulas 
to immunize against all four serotypes'””°. One feature of DENV is that 
it undergoes incomplete furin maturation cleavage of prM in many cell 
types, giving rise to heterogeneous mosaic particles with an immature- 
like spiky patch on one side and a smooth mature-like region on the 
opposite side*’. These particles are infectious, as they can fuse with 
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Figure 4 | EDE2 A11-ZIKV sE complex. Colour coding is as in Fig. 3. 
a, Overall view of the complex, with only one Fab bound per sE dimer, 
owing to crystal packing. The dashed ellipse represents the position of 
the missing Al1 Fab. The inset compares the angle of binding to the sE 
dimer in ZIKV and in DENV-2. b, c, Interactions at the b strand in ZIKV 
(left) and in DENV-2 (right). Note the different angle of the b strand with 


the cellular membrane through the smooth, mature side. Because the 
FLE is exposed in immature regions”’, most of the antibody response 
in DENV-infected patients is directed against it’®. These cross-reactive 
antibodies coat the particles on the immature side” but neutralize 
only weakly, because they can bind the mature side only when the 
E protein ‘breathes”**°. A recently published structure of monomeric 
ZIKV sE in complex with an FLE-specific monoclonal mouse antibody 
of low neutralizing activity indeed shows that its binding site would 
be occluded in the dimeric E protein on mature infectious virions'”. 
The observation that P6B10 and other FLE antibodies still neutralize 
DENV'"? suggests that E in the mature patches on DENV spends more 
time in conformations that expose the FLE than does E in those patches 
on ZIKV. This inference is consistent with the higher thermal stability 
of ZIKV reported recently’. 

Our results suggest that the epitope targeted by the EDE1 bnAbs 
is better suited for developing an epitope-focused vaccine for viruses 
in the ZIKV/DENV super serogroup than is the FLE, which induces 
poorly neutralizing and strongly infection-enhancing antibodies'?"*. 
The EDE1 is also better suited than the related EDE2 epitope: although 
the EDE1 bnAbs require an E dimer to bind, the actual binding 
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respect to the antibody (the antibody is in exactly the same orientation in 
both panels). d, e, Zoom of the glycan on the 150 loop for ZIKV sE (d) and 
for DENV-2 sE (e), with sugar residue numbers described in the key. The 
CDR H3 helix is too far to make interactions with the glycan, as is the case 
in the DENV-2 structure (see Extended Data Figs 2b and 5b). 


determinants are centred on the b strand and on the highly con- 
served, E-dimer-exposed elements of the fusion loop, as shown by 
the comparison between their binding to DENV-2 and ZIKV sE. 
The fact that EDE2 bnAbs rely heavily on their contact points on the 
adjacent subunit—on the variable 150 loop in which glycosylation 
is not always present—is a drawback, as demonstrated by their poor 
affinity (Fig. 1) and their strong induction of antibody-dependent 
enhancement”. 

Targeting the b strand and the E-dimer-exposed elements of the 
fusion loop appears as a powerful alternative to the multi-immunogen 
approaches against the DENV cluster that have had limited success in 
clinical trials’”. As the E protein polypeptide chain displays neither 
insertions nor deletions in the region of the b strand for any medically 
relevant Flavivirus, this region presents a low risk of inducing escape 
mutations, most likely because it is also the interacting site with prM 
during virus maturation. Finally, in a more immediate application, 
our study also suggests that the EDE1 antibodies, perhaps carrying 
the ‘LALA mutation’ if effector functions are to be avoided, could be 
useful for immune prophylaxis for pregnant women at risk of contract- 
ing ZIKV infection. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized, and investigators were not blinded to allocation during 
experiments and outcome assessment. 

Recombinant production of ZIKV sE protein. Recombinant Zika virus sE pro- 
tein (strain H/PF/2013, GenBank accession number KJ776791) was produced 
with a tandem strep-tag in the Drosophila Expression System (Invitrogen) as 
described previously*”*°. A chemically synthesized DNA fragment (GeneArt) 
containing the Zika sE sequence (amino acids 1-408) was cloned into the 
expression vector pT389 (ref. 31) that encodes the export signal sequence BIP, 
an enterokinase cleavage site and the strep-tag. Drosophila Schneider 2 (S2) cells 
were stably transfected using blasticidin for selection. Protein expression was 
induced by the addition of CuSO, and supernatants were collected 7-10 days after 
induction. Antigens were purified via affinity chromatography with Streptactin 
columns (IBA) according to the manufacturer's instructions. A final purification 
gel filtration step used a Superdex increase 200 10/300 GL column equilibrated 
in 50mM Tris, pH 8, 500 mM NaCl. 

Production of antigen-binding (Fab) and scFv fragments of the bnAbs. The 
bnAb fragments were cloned into plasmids for expression as Fab** and scFv°? 
in Drosophila S2 cells. The constructs contain a tandem strep-tag fused at the 
C terminus (only of the heavy chain in the case of the Fab) for affinity purification. 
The purification protocol included a Streptactin affinity column followed by gel 
filtration as described above. 

Expression of human monoclonal anti-DENV E antibodies. Full IgG antibodies 
were produced in 293T cells (a gift from C. Lee), which were free from mycoplasma 
contamination tested by Lookout Mycoplasma PCR detection kit (Sigma MP0035). 
These cells were co-transfected with plasmids containing heavy and light chains 
of immunoglobulin G1 as described previously’. 

Immune complex formation and isolation. The purified ZIKV sE protein was 
mixed with Fab A11 or scFv C8 (in approximately twofold molar excess) in stand- 
ard buffer (500 mM NaCl, Tris 50 mM, pH 8.0). The volume was brought to 0.5 ml 
by centrifugation in a Vivaspin 10 kDa cut-off; after a 30-min incubation at 4°C, 
the complex was separated from excess Fab or scFv by size-exclusion chromatog- 
raphy for ZIKV sE and scFv C8. For ZIKV sE and Fab A11, no apparent complex 
formation could be seen in size-exclusion chromatography; therefore a solution 
containing sE at a concentration of 1.5mgml~! and Fab A11 at a concentration 
of 3mgml | (corresponding to a molar ratio ~1:2 antigen:antibody) was directly 
used for crystallization. In all cases, the buffer was exchanged to 150 mM NaCl, 
15mM Tris, pH 8, for crystallization trials. The protein concentrations used for 
crystallization, determined by measuring the absorbance at 280 nm and using 
an extinction coefficient estimated from the amino acid sequences, are listed in 
Extended Data Table 1. 

Real-time biolayer interferometry binding assays. The interactions of purified 
ZIKV E protein with IgG FLE P6B10, IgG EDE1 C8, IgG EDE2 A11, and control 
IgG 28C (an anti-influenza virus) were monitored in real-time using a bio-layer 
interferometry Octet-Red384 device (Pall ForteBio). Anti-human IgG Fc cap- 
ture biosensors (Pall ForteBio) were loaded for 10 min at 1,000 r.p.m. shaking 
speed using antibodies at 51g ml“! in assay buffer (PBS plus 0.2 mg ml“! BSA 
and tween 0.01%). Unbound antibodies were washed away for 1 min in assay 
buffer. IgG-loaded sensors were then incubated for 15 min at 1,200r.p.m. in the 
absence and presence of twofold serially diluted ZIKV sE protein concentra- 
tions in assay buffer. Molar concentrations were calculated for the sE protein 
in a dimeric form. For the antibodies FLE P6B10, EDE1 C8 and EDE2 A11, 
the following ZIKV sE concentration ranges: 0.78-50 nM, 3.125-200 nM and 
50-3,200 nM, were respectively used. Reference binding experiments were car- 
ried out in parallel on sensors loaded with control IgG 28C. Dissociation of the 
complexes formed was then monitored for 10 min by dipping sensors in assay 
buffer alone. Operating temperature was maintained at 25°C. The real-time 
data were analysed using Scrubber 2.0 (Biologic Software) and Biaevaluation 4.1 
(GE Healthcare). Specific signals were obtained by double-referencing, that is, 
subtracting non-specific signals measured on non-specific IgG-loaded sensors 
and buffer signals on specific IgG-loaded sensors. Association and dissociation 
profiles, as well as steady-state signal versus concentration curves, were fitted 
assuming a 1:1 binding model. 

Crystallization and X-ray structure determinations. The crystallization 
and cryo-cooling conditions for diffraction data collection are listed in Extended 
Data Table 1. Crystallization trials were performed in sitting drops of 400 nl. 
Drops were formed by mixing equal volumes of the protein and reservoir solution 
in 96-well Greiner plates, using a Mosquito robot and monitored by a Rock- 
Imager. Crystals were optimized using a robotized Matrix Maker and Mosquito 


setups on 400 nl sitting or hanging drops, or manually in 24-well plates using 
2-3 11 hanging drops. 

Because of the strong anisotropy of the crystals (see results for anisotropy in 
Extended Data Table 1), an important number of crystals was tested at several beam 
lines at different synchrotrons (SOLEIL, St Aubin, France; ESRF, Grenoble, France; 
SLS, Villigen, Switzerland). The crystals having the less anisotropic diffraction data 
were used to determine the structures. The data sets were indexed, integrated, 
scaled and merged using XDS*4 and AIMLESS*. A preliminary model of ZIKV 
sE protein was built from the DENV-2 sE (4UTA) structure using the structure 
homology-modelling server SWISS-MODEL™. The structures of the complexes 
were then determined by molecular replacement with PHASER™ using the search 
models listed in Extended Data Table 1. AIMLESS and PHASER programs were 
used within the CCP4 suite**. 

The DEBYE and STARANISO programs developed by Global Phasing Ltd were 
applied to the data scaled with AIMLESS without applying a resolution limit, using 
the STARANISO server (http://staraniso.globalphasing.org/). These programs 
perform an anisotropic cut-off of merged intensity data on the basis of an analysis 
of local I/o(I), compute Bayesian estimates of structure amplitudes, taking into 
account their anisotropic fall-off, and apply an anisotropic correction to the data. 
These corrected anisotropic amplitudes were then used for further refinement of 
the structures with BUSTER/TNT™. Please note that the Extended Data Table 1 
shows the refinement statistics against the full sets of reflections truncated at the 
best high-resolution along the h, k or | axis. 

The models were then alternatively manually corrected and completed using 
COOT* and refined using BUSTER/TNT against the amplitudes corrected for 
anisotropy. Refinements were constrained using non-crystallographic symmetry. 
The refined structures have the following final Rwork/Riree (in %) values: ZIKV sE- 
EDE1 C8 scFv (19.5/22.1), ZIKV sE-EDE2 A11 Fab (22.3/23.7) and unliganded 
ZIKV sE (20.8/23.6) (see Extended Data Table 1). 

Analysis of the atomic models and illustrations. Each complex was analysed with 
the CCP4 suite of programs and the polar contacts were computed with the PISA 
website“. For the intermolecular interactions shown in Extended Data Figs 2 and 3 
and Extended Data Tables 4 and 5, the maximal cut-off distances used were 4A 
and 4.75 A for polar and van der Waals contacts, respectively. Multiple sequence 
alignments were calculated using Clustal W and Clustal X version 2 (ref. 42) on 
the EBI server**. The figures illustrating the structural models were prepared 
using ESPript* and the PyYMOL Molecular Graphics System, version 1.5.0.4 
(Schrédinger) (http://pymol.sourceforge.net). 

Phylogenic trees. The maximum likelihood phylogenetic trees were inferred using 
12 representative amino acid sequences of Flavivirus envelope protein E or RNA 
polymerase NSS proteins, using the LG model available in PhyML* and a combi- 
nation of SPR+NNI branch-swapping. Bootstrap values were calculated from 100 
bootstrap replicates. The trees were visualized using Figtree (http://tree.bio.ed.ac. 
uk/software/figtree/). The accession codes of sequences used in the tree: Zika virus 
(ZIKV, KJ776791, strain H-PF-2013_French_Polynesia); dengue virus serotype 1 
(DENV-1, NC_001477); dengue virus serotype 2 (DENV-2, NC_001474); dengue 
virus serotype 3 (DENV-3, NC_001475); dengue virus serotype 4 (DENV-4, 
NC_002640); Saint Louis encephalitis virus (SLEV, NC_007580); Japanese 
encephalitis virus (JEV, NC_001437; Murray Valley encephalitis virus (MVEV, 
NC_000943); West Nile virus (WNV, NC_001563); yellow fever virus (YFV, 
NC_002031); tick-borne encephalitis virus (TBEV, NC_001672); and Powassan 
virus (POWV, NC_003687). 

Viral stocks. The African strain Zika HD78788 was obtained from the Institut 
Pasteur collection and the Asian strain Zika PF13, isolated from a patient during 
ZIKV outbreak in French Polynesia in 2013, was obtained through the DENFREE 
(FP7/2007-2013) consortium. Viral stocks were prepared from supernatant of 
infected C6/36 cells (ATCC CRL-1660) clarified by centrifugation at 3,000g at 4°C 
and titrated on Vero cells (ATCC CRL-1586) by a focus-forming assay. Stocks were 
kept at —80°C until use. All cell lines were free from mycoplasma contamination. 
Neutralization assays. Virus neutralization by the tested human antibodies 
was evaluated using a focus reduction neutralization test (FRNT). About 100 
focus-forming units from virus stocks were incubated with a serial dilution of 
antibody for 1h at 37°C. The mixture was then added to Vero cells and foci were 
left to develop in presence of 1.5% methylcellulose for 2 days. Foci were then stained 
after fixation with 4% formaldehyde using anti-E 4G2 antibody (ATCC HB-112) 
and anti-mouse horseradish peroxidase (HRP)-conjugated secondary antibody 
(ThermoFisher 31430). The foci were visualized by diaminobenzidine (DAB) 
(Sigma D5905) staining and plates were counted using the ImmunoSpot $6 Analyser 
(Cellular Technology Limited, CTL). Neutralization curves and 50% FRNT values 
were calculated by nonlinear regression analysis using Prism 6, GraphPad software. 
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Extended Data Figure 1 | Antibody binding to recombinant ZIKV 
protein. a, Biolayer interferometry experiments plotted on a linear scale. 
The antibodies were immobilized on the biosensor tip, and the ZIKV sE 
protein was in solution at the indicated concentrations. The antibody 
used is indicated in each plot. Note that the horizontal scale is different 
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° 
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for the three antibodies. The estimated dissociation constant (Ky) and 
estimated dissociation rate (Ko) values are indicated. b, Size-exclusion 
chromatography results for isolated sE, isolated Fab fragments, and ZIKV 
sE plus Fab fragments, as indicated. 
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Extended Data Figure 2 | See next page for caption. 
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Extended Data Figure 2 | Residues involved in bnAb-antigen 
interactions. a, Antibody contacts on the amino acid sequence alignment 
of ZIKV and DENV-2 sE. A red background highlights identical 

residues. Secondary structure elements are indicated together with their 
labels above (ZIKV) and below (DENV-2) the sequences. The domain 
organization of ZIKV and DENV-2 sE is symbolized by a coloured bar 
above the sequences (domain I red, domain I] yellow, domain III blue 

and the fusion loop orange). Residues involved in polar and van der 
Waals protein-protein contacts are marked using blue and green symbols, 
respectively, as indicated in the inset key, displayed above and below 

the alignment for ZIKV and DENV-2 sE, respectively. Full and empty 
symbols correspond to antibody contacts on the reference subunit of sE 
(defined as the one contributing the fusion loop to the epitope) and the 
opposite subunit of sE, respectively. Residues contacted only by the heavy 
or light chain are marked with squares or triangles, respectively, and those 
contacted by both antibody chains with circles. Dots above the sequences 
mark every 10 residues on the ZIKV sE sequence. Disulfide bridges are 
numbered in green above the sequences. b, Amino acid sequence of the 


heavy and light chains variable domains (vH and vL) of bnAbs EDE1 C8 
(top) and EDE2 A11 (bottom) with the framework (FRW) indicated by 
black bars and IMGT CDR regions by thin dashed lines. The secondary 
structure elements of the Ig vH and vL }-barrels are indicated above 

the sequences. Somatic mutations are in red and residues arising from 
recombination at the V-D-J junction are in green. Symbols above and 
below the sequences mark residues involved in contacts with ZIKV and 
DENV-2 sE, respectively, coded for the contacted site in sE as indicated 
in the key (inset at the bottom). Polar and van der Waals contacts are 
shown in blue and green, respectively. The antibody residues contacting 
the reference sE subunit (defined as the one contributing the fusion loop 
to the epitope) are marked by plain colour symbols while those making 
contact across the dimer interface by empty coloured symbols. Red boxes 
highlight the contacts found in the DENV-2 sE complex and absent in the 
ZIKV sE complex, involving N67 glycan, kl and 150 loops. The details of 
the polar contacts are listed in the Extended Data Tables 4 and 5 (see also 
Fig. 3e, f). The predicted vH and vL germline alleles are indicated with the 
corresponding CDR lengths (see Table 1 in ref. 16). 
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ZIKV sE / EDE1 C8 ScFv 


from panel a 


EDE1 C8 ScFv 


Extended Data Figure 3 | Details of EDE1 C8 bnAb contact across the 
dimer interface. a, Overall view of the ZIKV sE-EDE1 C8 scFv complex. 
The box indicates the region zoomed in b. b, Details of the interactions 

of the C8 light chain with domain III across the dimer interface. 

c, Same region for the DENV-2 sE-EDE1 C8 Fab complex. Note that 

the sE residues involved are different. d, The complex rotated by 120° 

(as indicated by the arrow) to show the interaction in the ij loop, enlarged 
ine. e, The ij loop is displayed in sticks, to show the interaction of its main 


ARTICLE 


Fusion 


chain with the antibody. Domain II from the subunit across is coloured 
green to distinguish from domain II of the reference subunit; the dashed 
sticks for the arginine residue indicate that it has poor electron density 

in the crystal. f, Same view of the complex with DENV-2. Note that the 
residues from across the dimer interface that contact the antibody are 
different. The residues in the various CDRs are coloured coded, matching 
their label colour (as in Figs 3 and 4). 
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ZIKV sE / EDE1 C8 ScFv ZIKV sE / EDE2 A11 Fab 
EDE1 C8 ScFv 


EDE2 A11 Fab 


EDE1 C8 ScFv 


glycan 


150 loop N154 150 loop 
missing 
-5.0 k /e + 5..0 kT/e 
Extended Data Figure 4 | Surface electrostatic potential on an open- with C8 (left) results in a positive surface patch at one edge of the epitope, 
book representation of the immunocomplexes. The electrostatic which is counteracted by the residues in the 150 loop, as shown on the 
potential is coloured according to the bar underneath. The antibody right in the complex with All where this loop is ordered. 


footprints are outlined in green. The disordered 150 loop in the complex 
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ij-loop 


ZIKV sE / EDE2 A11 Fab 
ZIKV sE mature virion (5IRE) coloured in white 


~~ aN Z. S 
ZIKV sE / EDE2 A11 Fab 
ZIKV sE / EDE1 C8 ScFv coloured in pink 


Extended Data Figure 5 | Details of the A11 interaction with the 
glycan on the 150 loop. a, Superposition of the ZIKV sE-A11 complex 
(in colours) on the E protein from the cryo-EM structure of the mature 
virion (ref. 5) (PDB code 5IRE) in white. The E protein was superimposed 
on the tip of domain II of the reference subunit together with domain III 
from the opposite subunit. It shows that the 150 loop adopts essentially 
the same conformation, although fewer sugar residues are visible in the 
absence of the antibody. b, Superposition of the ZIKV sE-A11 complex 
(in colours) on the DENV-2 sE-A11 complex (in white). The variable 
domains of the antibody from the two structures were superimposed on 
each other. Note that in DENV-2 the glycan packs against the a-helix of 


ZIKV N154 
glycan on 
mature virion 


ARTICLE 


DENV-2 sE / EDE2 A11 Fab (4UT7) coloured in white 


the CDR H3, whereas in ZIKV sE the glycan is too far to make the same 
interaction. c, The ZIKV sE-C8 complex (in pink) was superimposed on 
the ZIKV sE-A11 complex (in colours), to show the clash of the C8 light 
chain with the glycan, forcing it to move out of the way and be disordered. 
The superposition also shows that EDE1 C8 reaches further in to contact 
the ij loop and the k/ loop of the adjacent subunit, as well as domain III. As 
in a, the superposition was done using the tip of domain I] of the reference 
subunit and domain III of the adjacent subunit in the dimer as anchors. 
The two black asterisks mark the places where the electron density of the 
150 loop is lost, resulting in no density in the ZIKV sE-C8 crystal for the 
short helix, nor for the glycan. 
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Extended Data Table 1 | Crystallization conditions, data collection and refinement statistics 


ZIKV sE - EDE1 C8 scFv ZIKV sE - EDE2 A11 Fab ZIKV sE 
Protein Data Bank code 5LBS 5LCV 5LBV 
Crystallization conditions 
: % 1.5 (ZIKV sE) 
Protein conc. (mg/ml) 1.3 (complex) 3 (EDE2 A11 Fab) 1.4 
Crystallization buffer 1.26 M (NH,),SO4, 3.5 M Na Formate, 28.6% PEG 8K 
e 0.1 MCHES pH 9.5, 0.2 MNaCl 0.1 M Tris pH 8.5 0.1 M Bicine pH 8.3 
Crystallization method sitting drop at 18°C. sitting drop at 18°C hanging drop at 18°C. 
20% ethylene glycol in 67% 16% glycerol in 67% ea . : 
Cryo-protectant of crystallization solution of crystallization solution oil mixture paraffin:paratone (1:1) 


Data Collection'’ 


Beamline SOLEIL, Proxima 1 SOLEIL, Proxima 2 SOLEIL, Proxima 1 

Detector Pilatus 6M Eiger 9M Pilatus 6M 

Space group P 2,22, C222, C222, 

Unit cell: a, b, c (A) 60.8, 121.3, 257.8 204.3, 207.3, 124.6 64.6, 213.6, 124.0 
a, B, y (°) 90, 90, 90 90, 90, 90 90, 90, 90 

Resolution (A) 40.0-2.41 (2.46-2.41) 40.0-2.64 (2.69-2.64) 38.5-2.2 (2.25-2.20) 


Anisotropy direction* 
Resolution where CC,). > 0.50 


overall (A) 2.69 2.87 2.2 
along h axis (A) 3.24 4.23 3.26 
along k axis (A) 2.95 2.64 2:2. 
along | axis (A) 2.41 2.9 2.2 
Measured reflections 560 284 (35 314) 760 128 (38 976) 318 556 (18 018) 
Unique reflections 74 842 (4 571) 77 483 (4 547) 44 168 (3 025) 
Completeness (%) 100 (100) 99.8 (99.4) 99.5 (93.4) 
Mn(|) half-set correlation 0.99 (0.15) 0.98 (0.17) 0.99 (0.33) 
Mean I/o(I) 6.7 (1.4) 5.1 (0.4) 9.4 (0.5) 
Multiplicity 7.5 (7.7) 9.8 (8.6) 7.2 (6.0) 
B wilson (A’) 41.2 47.4 60.1 
Rmerge 0.33 (3.1) 0.50 (6.6) 0.10 (3.3) 
Rmeas 0.36 (2.9) 0.53 (7.1) 0.11 (3.7) 
Rpim 0.13 (1.2) 0.17 (2.4) 0.04 (1.5) 


Structure Determination 


ZIKV sE from ZIKV sE - EDE2 A11 Fab ZIKV model sE from DENV-2 sE (4UTA) ZIKV sE from ZIKV sE - EDE1 C8 scFv 


MR search models complex (5LCV) ; EDE2 A11 scFv (4UT7) complex (5LBS) 
EDE1 C8 Fab variable domain (4UTA) EDE2A11 Fab constant domain (4UTB) 
NCS restraint 2 2 applied only on sE dimer 2 
Targeting NA EDE2 A11 scfv (4UT7) NA 
Number of TLS groups 12 12 12 
Refinement® 
Resolution cut-off (A) 40.0-2.41 40.0-2.64 40.0-2.2 
Rwork (%) / Rfree (%) 19.5/22.1 22.3 / 23.7 20.8 / 23.6 
N° of Work / Free reflections 74 785/3719 76 253 / 3 840 44 038 /2 219 
<B> atomic factors (A)? 72.6 89.3 64.1 
N° of protein atoms 9 595 9 495 5 978 
N° of heteroatoms 212 43 43 
R.m.s.d. from ideal 
Bond lengths (A) 0.01 0.01 0.01 
Bond angles (°) 1.22 1.28 1.24 
Ramachandran! 
Favoured (%) 96.5 95.7 97.3 
Allowed (%) 3.26 3.65 1.86 
Outliers (%) 0.24 0.65 0.84 
The ZIKV sE buffer used for all crystallization experiments was: 150mM NaCl, 15mM Tris, pH 8. 


he protein concentration was estimated using theoretical extinction coefficients of the complexes (ZIKV sE + Fab or scFv). Absorbance at 280 nm (Azgo nm) of the protein solution was measured 

efore crystallization. The theoretical extinction coefficients for individual component are: ZIKV sE: 1.345; bnAb EDE2 A11 Fab: 1.68 (see Methods for more details); bnAb EDE1 C8 scFv: 1.9. Extinction 
oefficients were calculated without taking carbohydrate moieties into account. One crystal was used to collect a diffraction data set for each complex to determine the structure. 

he anisotropy statistics were computed with AIMLESS. 

Highest resolution shell is shown in parentheses. 

Low-resolution for data processing and refinements was truncated to 40A. 

||Ramachandran statistics were calculated with MolProbity. 

CC1/2, correlation coefficient; h, k and /: indices that define the lattice planes; I/o(I), empirical signal-to-noise ratio; MR, molecular replacement; NCS, non-crystallographic symetry; Rmeas, 
multiplicity-corrected R; Rpim, expected precision; TLS, parameterization describing translation, liberation and screw-motion to model anisotropic displacements. 


OH+t+oOT # 
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Extended Data Table 2 | The r.m.s.d. values between sE dimers in the various structures of ZIKV and DENV-2 


ZIKV sE 
(this work) 


ZIKV sE- ZIKV sE- 


EDE2 A11 Fab 


(this work) (this work) 


EDE1 C8 scFv* 


ZIKV sE 
(Dai et al, 
2016) 


ZIKV 
mature virion 
sE dimer 
from au 


ZIKV 
mature virion 
icosahedral 
sE dimer 


ZIKV 
thermally 
stable 
mature virion 
sE dimer 
from au 


ZIKV 
thermally 
stable 
mature virion 
icosahedral 
sE dimer 


ZIKV sE (this work) / 


ZIKV sE-EDE2 A11 Fab 3.33 (796) 


/ 


2.31 (776) 


ZIKV sE-EDE1 C8 scFv" 2.29 (764) 


2.41 (773) 


2.29 (761) 0.57 (764)* 


ZIKV sE (Dai et al, 2016) 1.8 (769) 


ZIKV mature virion BIRE| 3.22 (800) 
sE dimer from au 


ZIKV mature virion 


icosahedral sE dimer eRe 


2.88 (800) 


3.06 (771) 
2.93 (761) 
1.57 (778) 
1.69 (764) 


1.60 (778) 
1.71 (764) 


4.1 (768) 


1.72 (799) 


1.76 (799) 


3.74 (775) 


0.46 (806)$ 
0.45 (1002)! 


ZIKV thermally stable 
mature virion 

sE dimer from au 
ZIKV thermally stable 
mature virion 
icosahedral sE dimer 


51Z7 | 3.03 (800) 


2.94 (800) 


1.75 (778 
1.80 (799) 1.88 Lee 
4.67 (78) 


eT) 1.80 (764) 


3.89 (775) 


3.78 (775) 


1.36 (806)8 
1.57 (1002)! 
1.36 (806)$ 
1.56 (1002)! 


1.36 (806) 
1.56 (1002)! 
1.36 (806)§ 
1.54 (1002)! 


1.41 (806)$ 
1.43 (1008)! 


ZIKV sE 
(this work) 


ZIKV/DENV-2* 


PDB 


code 5LBV 


5LCV 5LBS 


ZIKV sE- ZIKV sE- 
EDE2 A11 Fab | EDE1 C8 scFvt 
(this work) (this work) 


ZIKV sE 
(Dai et al, 
2016) 


5JHM 


ZIKV sE 
mature virion 
sE dimer 
from au 


SIRE 


ZIKV sE 
mature virion 
icosahedral 
sE dimer 


SIRE 


ZIKV ZIKV 
thermally thermally 
stable stable 
mature virion | mature virion 
sE dimer icosahedral 
from au sE dimer 


5IZ7 5IZ7 


AUTC| 4.57 (766) 


DENV-2 sE-EDE2 A11 Fab 


AUTB| 4.45 (773) 


5.49 (756) 
5.55 (742) 


5.61 (763) 
5.57 (744) 


6.47 (768) 


7.08 (774) 
0.29/0.28 (241)" 


4.75 (753) 


4.2 (754) 


5.94 (772) 


6.29 (773) 


5.94 (772) 


6.28 (773) 


6.23 (757) | 6.18 (757) 


7.38 (763) | 7.31 (763) 


DENV-2 sE-EDE1 C8 Fab |4UTA| 6.04 (745) 8.17 (744) ee an 5.71 (747) | 7.52(751) | 7.49(751) | 7.65 (738) 7.57 (738) 
z ivi § § § § 
Bey 2 ints virion 3.89 (779) 2.26 (778) as vey 4.82 (766) | 2-01 (785) 2.04 (785)8 | 4.33(770)§ | 4.36 (770) 
Sse aimeriromiau 81 (755) 2.09 (979)' | 2.13 (979)! | 2.15 (988)! | 2.20 (988)! 
z iri § § § § 
se aN p 3.59 (779) 2.45 (778) Bee be 4.49 (766) 1.99 (785) 1.99 (785) 4.31 (770) 4.33 (770) 
eo eaec ae eer 68 (755) 2.07 (979)' | 2.07 (979)! | 2.14 (988)! | 2.15 (988)! 
DENV-2 penis 
mature Pars 
* DENV-2 sE- DENV-2 sE- a mature virion 
DENV-2 DENV-2 SE | EE? Ati Fab| EDE1C8Fab | Vin | ieocahedral 
Se dimer sE dimer 


DENV-2 mature virion 
sE dimer from au 


) / 
) 6.99 


from au 


DENV-2 mature virion 


icosahedral sE dimer a 


4.92 (779) 


6.37 (775) 6.56 


0.89 (790)8 
0.88 (990)! 


r.m.s.d. (in A) computed with PyMOL software using the Ca atoms of the sE dimers. In parentheses, the number of Ca atoms used for the calculation. au, asymmetric unit. 
«The rm.s.d. is computed using residues 1 to 403 for ZIKV sE, or residues 1 to 395 for DENV-2 sE, except when indicated§,||. 


{There are two independent half dimers (ZIKV sE-EDE1 C8 scFv) in the asymmetric unit. 


the asymetric unit. 


4This r.m.s.d. is computed between the two independent half dimers of sE in the asymetric unit. 
§This r.m.s.d. is computed between the sE dimers excluding stem and transmembrane regions of ZIKV (residues 1 to 403) and DENV-2 (residues 1 to 395). 
||This r.m.s.d. is computed between the sE dimers including stem and transmembrane regions of ZIKV and DENV-2 (residues 1 to C-terminal). 


The two r.m.s.d. values refer to the superposition of the variable domains of Fab Al1 in Z 
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he r.m.s.d. is computed between the two dimers of sE generated by crystallographic symmetry for each sE in 


KV sE-EDE2 A11 Fab complex on each Fab All in DENV-2 sE-EDE2 A11 Fab complex. 
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Extended Data Table 3 | Buried surface areas and surface complementarity in the ZIKV sE dimer-EDE complexes and in DENV-2 sE dimer— 
EDE complexes 


BSA BNA 7 
Fab or séFy BSA sE dimer Complex 
Reference Opposite Main chain 
vH vL Total subunit subunit Total é be Bhs os sc 

(glycans*) (glycans*) atoms 
ZIKV sE - EDE1 C8 scFv fragment 
ZIKV sE dimer Epitope dimer-1 426.1 494.8 920.9 653.0 (NA) 237.2 (NA) 890.2 346.9 (39%) NA 905.5 0.683 
ZIKV sE dimer Epitope dimer-2 438.4 492.4 930.8 678.1 (NA) 223.2 (NA) 901.3 343.9 (38%) NA 916.0 0.738 
DENV-2 sE - EDE1 C8 Fab fragment (4UTA) 
DENV-2 sE dimer Epitope A 718.9 (516.8)' 471.0 (471.0) 1189.9 919.1 (192.9) 222.2 1141.3 357.8(31%) 192.9(17%) 1165.6 0.693 
DENV-2 sE dimer Epitope B 831.9 (570.3) 528.3 (528.3)' 1360.2 945.5 (230.9) 340.0 1285.5 359.0(28%) 230.9(18%) 1322.8 0.687 
ZIKV sE - EDE2 A11 Fab fragment! 
ZIKV sE dimer Epitope 718.5 75.7 793.4 570.0 (0.0) 217.4 (134.6) 787.4 253.7 (32.2%) 134.6 (17%) 790.4 0.674 
DENV-2 sE - EDE2 A111 Fab fragment (4UTB) 
DENV-2 sE dimer Epitope A 923.9 (251.3)' 189.2(148.8)' 1113.0 531.8 (14.2) 472.6 (342.4) 1004.4 221.5(22%) 356.6(35%) 1058.7 0.706 
DENV-2 sE dimer Epitope B 954.1 (302.2)' 185.1 (136.7)' 1138.8 460.3 (58.5) 571.3 (341.4) 1031.6 219.3 (21.2%) 400.0(39%) 1085.2 0.668 


BSA, buried surface area (in A2) of sE protein by the Fabs or scFv (calculated with the program ‘areaimol’ in CCP4). BSA/molecule, average buried surface area between one Fab or one scFv and the sE 
dimer. SC, shape complementarity coefficient (calculated with the program ‘sc’ in CCP4). The dots density used to compute both BSA and SC was set to 15 dots per A®. The van der Waals probe radius 
was set to 1.4A. NA, non applicable. 

*|n parentheses, contribution of glycan chains to buried surface area: N154 (for ZIKV sE) and N67 and/or N153 (for DENV-2 sE). 

tin parentheses, the BSA was computed removing the glycan chains N67 for DENV-2 sE, to compare with the BSA of ZIKV sE, which does not carry the N67 glycan. 

Contribution of main-chain atoms to BSA. In parentheses, contribution indicated as percentages. 

§There are two independent half dimers (ZIKV sE-EDE1 C8 scFv) in the asymmetric unit. The two dimers of ZIKV sE (dimer-1 and dimer-2) are generated by applying crystallographic symmetry for 
each sE in the asymmetric unit. 

||Only one Fab A11 binds to the sE dimer in the ZIKV sE-EDE2 Al1 Fab complex. 
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Extended Data Table 4 | Polar and salt-bridge interactions for ZIKV sE-EDE2 A11 Fab, DENV-2 sE-EDE2 A11 Fab, ZIKV sE-EDE1 C8 scFv 


and DENV-2 sE-EDE1 C8 Fab 
domains ZIKV sE - EDE2 Fab A111 (5LCV) DENV-2 sE - EDE2 Fab A11 (4UTB) epitope A DENV-2 sE - EDE2 Fab A11 (4UTB) epitope B 
of sE sE dist (A) Fab A11 CDR sE dist (A) Fab A11 CDR sE dist (A) Fab A11 CDR 
$70 [0] 2.96 R95 [NH1] L3 
$70 [0] 2.96 R95 [NH2] L3 
T 70 [0G1] 3.53 S$ 55 [0] H2 T 70 [0G1] 3.42 $55 [0] H3 
D 71 [0D1] 3.72 R95 [NH1] L3 
2 D 71 [OD1] 266  $100J[OG] H3 
£ S 72 [OG] 3.21 D 1001 [N] H3 S 72 [OG] ae D 1001 [N] H3 S$ 72 [OG] 3.61 D 1001 [N] H3 
- S 72 [OG] 2.71 D4100I[OD1] H3 S 72 [0G] 2.78 D100I[0D1] H3 S72 [OG] 3.01 D1001[0D1] H3 
S72 [0] 3.49  $100J[N] H3 $72 [0] 3.69 $100J[N] H3 
$72 [N] 3.39  P100H [0] H3 S72 [N] 3.45 P 100H [0] H3 
r= R73 [N] 3.44 D41001[(OD1] #3 
3 R73 [NE] 2.83 S$ 100J[OG] _H3 R 73 [NE] 3.59 S$ 100J[OG] _H3 
2 D 98 [0] 3.51 Y100G[OH] H3 
2 R 99 [NH1] 2.78 D 100! [OD1] H3 
5 R 99 [NH1] 3.47 D1001[OD2] 43 
® | cusion | R99 [NH2] 3.60 D4100I[OD1] #3 R99[NH2] 2.64 D4100I1[OD2] 3 R99[NH2] 2.76 D1001[OD1] #3 
loop R 99 [NH2] 2.70 D1001[OD2] 3 R99[NH2] 2.83 D4100I[OD1] H3 R 99 [NH2] 2.49 D4100I1[OD1] H3 
= G 102 [0] 3.16 $100C[N] H3 G 102 [0] 3.49  $100C [N] H3 G 102 [0] 3.83  $100C [N] H3 
: G 102 [0] 3.25 $100C[OG] H3 G 102 [0] 2.88 S100C[OG] H3 G 102 [0] 3.02 $100C[OG] H3 
s N 103 [N] 3.84 Y100A[O] H3 
G 104 [0] 3.78 R98 [NH1] H3 
K 251 [0] 2.72 Y100G[OH]  H3 K 246 [0] 2.77 Y100G [OH H3 K 246 [0] 2.76 Y 100G [0] H3 
K 247 [NZ] 3.39 D53[OD1]  H2 
ij loop K 247 [NZ] 3.7 D53[OD2] H2 
Q 248 [0] 3.23. Y 100F [0] H3 Q 248 [0] 3.70  Y 100F [OH] H3 
Q 248 [N] 3.18 Y 100F [OH H3 Q 248 [N] 3.85 Y 100F [OH] H3 
NG? no glycan at this position 
glycan 
dom lll |__P354 [0] 3.71 _N 27B[ND2]___L1 
V 153 [O] 3.5 S$ 100C [OG] H3 G 152 [O] 3.62 S$ 100C [OG H3 
150 D 154 [OD2] 2.79 N 31 [ND2] H1 D 154 [OD2] 3.64 N 31 [ND2] H1 
loop D154 [N] 3.17 $100C [OG H3 D154 [N] 3.35 $100C[OG] H3 
K 157 [NZ] 3.34 S 28 [OG] H1 K 157 [NZ] 3.20 S 28 [OG] H1 
8 N153-1 [N2] 3.43 F 99 [0] H3 N153-1 [N2] 3.28 F 99 [0] H3 
° N154-1[N2] 3.37 S 28 [OG] H1 
& N154-1 [03] 3.09 S 28 [OG] H1 
= Ss N153-3[03] 3.86 Y100[OH] H3 
§/s2 N153-4[03] 3.89 Y100[OH] H3 
E | 2d N153-4[04] 2.86 S 56 [OG] L2 
3) 25 N153-4[04] 3.64 S 56 [N] L2 N153-4[04] 3.08 S 56 [N] L2 
z N153-4[06] 3.69 S 56 [N] L2 N153-4[06] 3.55 S56 [N] L2 
N153-4[03] 3.89  Y100[OH] H3 
N153-6[02] 3.54 R94[NH2]  H3 N153-6[02] 3.81 R 94 [NH2] H3 
N153-6[05] 3.34 S 56 [OG] L2 
ZIKV sE - EDE1 scFv C8 (5LBS) DENV-2 sE - EDE1 Fab C8 (4UTA) epitope A DENV-2 sE - EDE1 Fab C8 (4UTA) epitope B 
SE dist (A) scFv C8 CDR sE dist(A) Fab C8 CDR SE dist (A) Fab C8 CDR 
M 68 [0] S72 $5 [OG] H2 T 68 [0] 3.11 A57 [N] H2 T 68[0] 3.11 A57 [N] H1 
M 68 [O] 3.22 A57 [N] H2 
S$ 70 [OG] 2.94 S 56 [OG] H2 T70[0G1 2.96 S$ 56 [OG] H2 T 70 [0G1] 34 S 56 [OG] H1 
5 $70 [N] 3.14 S 56 [OG] H2 T 70 [N] 3.23 S 56 [OG] H2 T 70 [N] 3.14 S 56 [OG] H1 
5 S$ 70 [0] 3.14 W294 [NE1] L3 T 70 [0] 2.76 W294 [NE1] L3 T 70 [0] 28 W 94 [NE1] L3 
7 S$ 72 [0] 3.12 N 93 [ND2] LS $72 [0] 3.48 N93 [ND2] L3 
= Q77 [NE2] 3.34 Y 92 [OH] L3 Q77 [NE2 3.46 Y 92 [OH] L3 
7 D 83 [0D1] 3.56 K 64 [NZ] FH3 
E 84 [OE1] 3.67 K 64 [NZ] H2 E 84 [OE2] 3.94 K 64 [NZ] H2 
2 E84[OE2] 3.71 K 64 [NZ] H2 E 84 [OE1] 3.83 K 64 [NZ] H2 
2 R 99 [NH2] 3.65 N93 [0] L3 
2 | Fusion | R99 [NH1] 2.87 N 93 [OD1] L3 R 99 [NH1 34 N 93 [OD1] L3 R 99 [NH1] 2.96 N 93 [OD1] L3 
3 loop R 99 [NH2] 3.02 N 93 [OD1] L3 R 99 [NH2 2.98 N93 [OD1] L3 R 99 [NH2] 2.89 N 93 [OD1] L3 
o G 104 [0] 2.74 N93 [N] L3 G 104 [0] 2.86 N 93 [N] L3 G 104 [0] 2.93 N 93 [N] L3 
me R 252 [NH2] 3.37 D 55 [OD2] H2 K 247 [NZ] 2.75 D 55 [OD2] H2 K 247[NZ] 2.78 D 55 [OD2] H2 
< . K 247 [NZ] 3.84. E53[0E1] H2 
ge | # loop | @ 253 [N] 3.11 Y 100 [OH] H3 Q 248 [N] 3.03 Y 100 [OH] H3 Q 248 [N] 2.97 Y 100 [OH] H3 
8 Q 253 [0] 3.37 Y 100 [OH] H3 Q 248 [0] 3.5 Y 100 [OH] H3 
N67 ee N67-1 [03] 3 G 65 [0] H2 N67-1 [03] 3.0 G 65 [0] H2 
glycan ECE ng eliyeein N67-4 [02] 27 $82B[OG] _FH3 N67-4[02] 3.32 $82B[OG] FH3 
hia disordered loop disordered loop disordered loop 
N153/ 
N154 disordered glycan disordered glycan disordered glycan 
glycan 
E S kl loop disordered loop disordered loop § 274 [N] 3.3 E 53 [OE2] H1 
si 
K 310 [NZ] 3.69 D50[OD1] L2 K 310 [NZ] 3.62  D50[OD1] L2 
. E311[0E1] 3.94 R66[NH1]  FL3 E311[0E1] 3.35 R66[NH1] FL3 
Q E311[0E2] 2.74 R66[NH1]  FL3 E311[0E1] 3.65 R66 [NH2] FL3 
S |Astrand E311[0E2] 3.28 R66[NH2]  FL3 E 311 [OE2] 2.8 R 66 [NH1] FL3 
E 311 [OE1] 3.28 S 30 [OG] FL4 
= T 315 [0] 2.58 R 66 [NH1] FL3 
s D362[0D2] 3.69 == R54 [NH1] L2 
5 K 373 [NZ] 2.85 S$ 52 [OG] L2 
© TE strand K 373 [NZ] 2.98 T 53 [OG] L2 
K 373 [NZ] 3.56 D 50 [0] L2 


The polar conta 
salt bridges; bo 
3.5A and 3.9A. 
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cts were computed using the PISA server (http://www.ebi.ac.uk/msd-srv/prot_int/pistart.html). Bold red font denotes main-chain atoms involved in contacts; bold black font denotes 
d blue font denotes glycan interactions. Hydrogen bonds distances cut-off: 3.5 A; salt-bridge distances cut-off: <4. The green background refers to polar contact distances between 
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Extended Data Table 5 | Polar and salt-bridge interactions for ZIKV sE-EDE2 A11 Fab, DENV-2 sE-EDE2 A11 Fab, ZIKV sE-EDE1 C8 scFv 
and DENV-2 sE-EDE1 C8 Fab 


ZIKV sE - EDE2 Fab A11 (5LCV) DENV-2 sE - EDE2 Fab A111 (4UTB) epitope Aj | DENV-2 sE - EDE2 Fab A11 (4UTB) epitope B 
CDR sE sE dist (A) Fab A11 CDR sE dist (A) Fab A11 CDR sE dist (A) Fab A11 CDR 
u1 dom Ill P 354 [O] 3.71 N 27B[ND2]___L1 
ie — ao iS N153-4 [04] 2.86 S 56 [OG] L2 N153-4 [04] 3.08 S 56 [N] L2 
Ss 2 & 3 3S g N153-4 [04] 3.64 S 56 [N] L2 
3 5 Paes 3 N153-4 [06] 3.69 S 56 [N] 2 N153-4 [06] Sr55) S 56 [N] L2 
D iid N153-6 [05] 3.34 S 56 [OG] L2 
= S 70 [O] 2.84 R 95 [NH1] L3 
[3 | bstrand | 474 10p1] 3.75 sR 9S[NH1] 3 
<t | N154-1[N2] 3.24 S 28 [OG] H1 
eal 8 N154-1 [03] 3.03 S 28 [OG] H1 
H1 |S §|= & 
3 °) 450 D154[OD2] 2.79 N 31 [ND2] D 154 [OD2] 3.64 N 31 [ND2] H1 
loop K 157 [NZ] 3.34 S 28 [OG] K 157 [NZ] 3.20 S 28 [OG] H1 
ii loo K 247 [NZ] 3.39 D 53 [OD1] 
H2 tise K 247 [NZ] 3.7 D 53 [OD2] 
b strand T 70 [OG1] 3.5 S 55 [O] H2 T 70 [OG1] 3.42 S 55 [O] H2 
S 72 [OG] 3.30 D 1001 [N] H3 S 72 [OG] Shi) D 1001 [N] H3 S 72 [OG] 3.61 D 1001 [N] H3 
S 72 [OG] 2.69 D100I[0D1] H3 S 72 [OG] 2.78 D100I[OD1] H3 S 72 [OG] 3.01 D 1001[0D1]_ H3 
S$ 72 [O] 3.50 S 100J [N] H3 S$ 72 [0] 3.69 S$ 100J [N] H3 
S$ 72 [N] 3.43 P 100H [O] H3 S 72 [N] 3.45 P 100H [0] H3 
R 73 [N] 3.45 D100I[OD1] H3 
R 73 [NE] 2.83 $100J [OG] _H3 R 73 [NE] 3.59 $100J[OG] _H3 
£ D 98 [O] 3.51 Y100G[OH] H3 
= R 99 [NH1] 2.83 D100I[OD1] H3 
> R 99 [NH1] 3.5 D1001[0D2]_ H3 
s Fusion R 99 [NH2] 3.64 D100I[0D1] H3 R 99 [NH2] 2.64 D100I[0D2] H3 R 99 [NH2] 2.76 D100I[OD1] H3 
= loop R 99 [NH2] 2.74 D100I[O0D2] H3 R 99 [NH2] 2.83 D100I[OD1] H3 R 99 [NH2] 2.49 D100I[OD1] H3 
G 102 [O] 3.16 $100C[N] H3 G 102 [O] 3.49 S$ 100C [N] H3 G 102 [O] 3.83 S$ 100C [N] H3 
G 102 [O] 3.20 S$100C[OG] H3 G 102 [O] 2.88 S$100C[OG] H3 G 102 [0] 3.02 $100C [OG] H3 
N 103 [N] 3.84 Y 100A [0] H3 
G 104 [O] 3.84 R 98 [NH1] H3 
K 251 [0] 2.83 Y100G[OH] H3 K 246 [O] 2.77 Y100G[OH] H3 K 246 [0] 2.76 Y 100G [0] H3 
ij loop Q 248 [0] 3.23 Y 100F [O] H3 Q 248 [0] 3.70 Y100F[OH] H3 
Q 248 [N] 3.18 _Y100F [OH] _H3 Q 248 [N] 3.85 Y100F[OH] H3 
150 loop V 153 [O] 3.5 $100C [OG] H3 G 152 [O] 3.62 S$100C[OG] H3 
D 154 [N] 3.17 _S$100C [OG] _H3 D154 [N] 3.35 S100C [OG] _H3 
N153-1[N2] 3.43 F 99 [O] H3 N153-1 [N2] 3.28 F 99 [O] H3 
N153/ N153-3 [03] 3.86 Y 100 [OH] H3 
N154 N153-4 [03] 3.89 Y 100 [OH] H3 
glycan N153-4 [03] 3.89 Y 100 [OH] H3 
N153-6[02] 3.54 R 94 [NH2] H3 N153-6 [02] 3.81 R 94 [NH2] H3 
ZIKV sE - EDE1 scFv C8 (5LBS) DENV-2 sE - EDE1 Fab C8 (4UTA) epitope A| | DENV-2 sE - EDE1 Fab C8 (4UTA) epitope B 
CDR sE sE dist (A) scFv C8 CDR sE dist (A) Fab C8 CDR sE dist (A) Fab C8 CDR 
FL1 E 311 [OE1] 3.28 S$ 30 [OG] FL1 E 311 [OE1] 3.28 S 30 [OG] FL1 
go T 315 [O] 2.58 R66[NH1] FL3 
FL3 2 E 311 [OE1] 3.94 R66[NH1]  FL3 E 311 [OE1] 3.94 R66[NH1] — FL3 
» |= E 311 [OE2] 2.74 R66[NH1]  FL3 E 311 [OE2] 2.74 R66 [NH1] —_FL3 
8 E 311 [OE2] 3.28 R66[NH2] FL3 E 311 [OE2] 3.28 R66[NH2] _FL3 
Ss no] 
: < & K 310 [NZ] 3.69 D 50 [OD1] L2 K 310 [NZ] 3.69 D 50 [OD1] L2 
=: on 
& 5 D 362 [OD2] 3.69 R 54 [NH1] L2 D 362 [OD2] 3.69 R 54 [NH1] L2 
£/'2)° |) 2) k373INzZ] 285  $52[0G L2 
# & | K373(NZ] 2.98 TS53[0G1] 12 
2 Ww K 373 [NZ] 3.56 D 50 [O] L2 
S 70 [O] 3.14 W 94 [NE1 L3 T 70 [O] 2.76 W 94 [NE1] L3 T 70 [0] 2.76 W 94 [NE1] L3 
b strand S$ 72 [O] 3.12 N 93 [ND2 L3 S 72 [O] 3.48 N 93 [ND2] L3 S 72 [O] 3.48 N 93 [ND2] L3 
Q77 [NE2 3.34 Y 92 [OH] L3 Q77 [NE2] 3.46 Y 92 [OH] L3 Q77 [NE2] 3.46 Y 92 [OH] L3 
L3 R 99 [NH2] 3.65 N 93 [0] L3 
Fusion R 99 [NH1 2.87 N 93 [OD1 L3 R 99 [NH1] 3.1 N 93 [OD1] L3 R 99 [NH1] 3.1 N 93 [OD1] L3 
loop R 99 [NH2 3.02 N 93 [OD1 L3 R 99 [NH2] 2.98 N 93 [OD1] L3 R 99 [NH2] 2.98 N 93 [OD1] L3 
G 104 [0] 2.74 N 93 [N] L3 G 104 [0] 2.86 N 93 [N] L3 G 104 [0] 2.86 N 93 [N] L3 
H1 ki loop S 274 [N] 3.3 E 53 [OE2] H1 S 274 [N] 3.3 E 53 [OE2] H1 
M 68 [0] 3.72 S 56 [OG] H2 
M 68 [0] 3.22 A57 [N] H2 T 68 [O] 3.11 A57 [N] H2 T 68 [0] 3.11 A57 [N] H2 
batrand S 70 [N] 3.14 S 56 [OG] H2 T 70 [N] 3.23 S 56 [OG] H2 T 70 [N] 3.23 S 56 [OG] H2 
S 70 [OG] 2.94 S 56 [OG] H2 T 70 [OG1] 2.96 S 56 [OG] H2 T 70 [OG1] 2.96 S 56 [OG] H2 
<| He E 84 [OE1] 3.67 K 64 [NZ] H2 E 84 [OE1] 3.67 K 64 [NZ] H2 
s E 84 [OE2] 3.71 K 64 [NZ] H2 E 84 [OE2] 3.71 K 64 [NZ] H2 
s i R 252 [NH2] 3.37 D55[OD2] H2 K 247 [NZ] 3.84 E 53 [OE1] H2 K 247 [NZ] 3.84 E 53 [OE1] H2 
> nad K247[NZ]_ 2.75 D55[OD2]__H2 || K247[NZ] 2.75 DS55[OD2] _H2 
2 N67 Poraiveontactniereaition N67-1 [03] 3.0 G 65[0] H2 N67-1 [03] 3.0 G 65 [O] H2 
glycan N67-1 [03] 3.0 G 65[0] H2 N67-1 [03] 3.0 G 65 [O] H2 
b-strand | D83[OD1] 3.56 K 64 [NZ] FH3 
FH3 N67 : a, N67-4 [02] 2.70 S$ 82B [OG] FH3 N67-4 [02] 2.70 S$ 82B[OG] FH3 
glycan ine Glial ELT (reciiteln N67-4[02] 3.32. $82B[OG] FH3]| N67-4[02] 3.32 $82B[OG] FH3 
H3 ij loop Q 253 [N] 3.11 Y 100 [OH] H3 Q 248 [N] 3.03 Y 100 [OH] H3 Q 248 [N] 3.03 Y 100 [OH] H3 
Q 253 [O] 3.37 Y 100 [OH] H3 
The polar contacts were computed with the PISA server (http://www.ebi.ac.uk/msd-srv/prot_int/pistart.html). Bold red font denotes main-chain atoms involved in contacts; bold black font denotes salt 
ridges; bold blue font denotes glycan interactions. Hydrogen bonds distances cut-off: 3.5 A; salt bridge distances cut-off: <4A. The green background refers to polar contact distances between 3.5A 


and 3.9A. 
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A. Frigeri!, M. Giardino!, A. Longobardo!, G. Magni!, E. Palombal, L. A. McFadden’, C. M. Pieters®, R. Jaumann!®, P. Schenk", 


R. Mugnuolo”, C. A. Raymond? & C. T. Russell? 


The typically dark surface of the dwarf planet Ceres is punctuated 
by areas of much higher albedo, most prominently in the Occator 
crater’. These small bright areas have been tentatively interpreted 
as containing a large amount of hydrated magnesium sulfate’, 
in contrast to the average surface, which is a mixture of low- 
albedo materials and magnesium phyllosilicates, ammoniated 
phyllosilicates and carbonates”*. Here we report high spatial and 
spectral resolution near-infrared observations of the bright areas 
in the Occator crater on Ceres. Spectra of these bright areas are 
consistent with a large amount of sodium carbonate, constituting the 
most concentrated known extraterrestrial occurrence of carbonate 
on kilometre-wide scales in the Solar System. The carbonates are 
mixed with a dark component and small amounts of phyllosilicates, 
as well as ammonium carbonate or ammonium chloride. Some of 
these compounds have also been detected in the plume of Saturn’s 
sixth-largest moon Enceladus’. The compounds are endogenous 
and we propose that they are the solid residue of crystallization of 
brines and entrained altered solids that reached the surface from 
below. The heat source may have been transient (triggered by impact 
heating). Alternatively, internal temperatures may be above the 
eutectic temperature of subsurface brines, in which case fluids may 
exist at depth on Ceres today. 

Occator is a prominent, geologically young crater’® (diameter 
~90km, 19.4° N, 239.0° E), and its brightest region (Fig. 1) is at the 
centre of the crater, close to a 10-km-wide central pit®. Several other 
smaller and less bright areas are present to the east on the crater floor. 
Occator has been observed in five spectral images (Fig. 1) from a 
distance of 1,400 km with a spatial resolution of about 380 m per 
pixel by the Visible and InfraRed Mapping Spectrometer (VIR)’ 
onboard the Dawn spacecraft. The analysis of spectra photometrically 
corrected to standard geometry indicates that the reflectance of most 
of Occator’s dark floor (0.03 at 0.55 jum and at 21m) is similar to that 
of regions around the crater. The highest reflectance is measured in the 
central pit with a value of 0.26 at 0.55 1m and 0.28 at 21m. A transition 
zone with decreasing level of reflectance from the bright areas to the 
surroundings can be recognized, associated with a variation in compo- 
sition (Fig. 2). The brightest areas show clear spectral differences from 
the typical crater floor (Fig. 2). The overall band area between 2.6 1m 
and 3.7 um increases going from the crater floor to the brightest pixels, 
and most absorptions become deeper and better defined (Fig. 2a). The 
2.7-um absorption of magnesium (Mg) phyllosilicates typical of Ceres’ 
surface’ shifts from 2.72 1m to 2.76 |um (Fig. 2b). The 3.07-\m absorp- 
tion, attributed to ammonia-bearing species”~‘, is clearly present on the 


crater floor but becomes less evident in brighter terrains and is absent 
in the brightest pixels (Fig. 2a and Fig. 3). Absorptions at 2.20-2.22 1m 
and around 2.86 1m appear only in the centre of the brightest areas 
(Figs 2b and c, and 3). Notably, absorptions near 3.4m and 3.9 1m 
increase markedly in the brightest areas and are consistent with enrich- 
ments in carbonates (Figs 2d and e, and 3), which have been previously 
identified as a common component of the surface of Ceres”. 

The bright material is clearly different from the rest of the crater floor 
and from the materials that typify most of the surface of Ceres”. The 
dark floor material can be modelled with the same components used 
to describe the overall surface of Ceres” (Extended Data Fig. 1). In con- 
trast, in high-albedo areas the relative proportions of the components 
changes, with a clear increase in carbonate content with respect to the 
dark material. Furthermore, additional phases appear. The shift of the 
metal hydroxide absorption from 2.72 1m to 2.76 1m (Fig. 2b) indicates 
a lower abundance of magnesium (Mg)-phyllosilicates and, possibly 
the occurrence of an aluminium (Al)-phyllosilicate such as a smectite, 
kaolinite or illite?!°. Hexahydrite (MgSO4-6H,0) was initially suggested 
as a major component (30 vol%) of the bright areas on the basis of 
spectral variations at visible wavelengths. However, short-wavelength 
near-infrared data for such a case would also exhibit strong H,O bands 
(Extended Data Fig. 2), which are now not observed in the Occator 
central pit spectra. Spectral modelling of the VIR data indicates that 
hexahydrite in amounts exceeding 2-3 vol% is inconsistent with these 
new VIR observations. Mixtures with water ice (Extended Data Fig. 3) 
indicate an upper limit of few volume per cent of water ice. 

Strong carbonate bands, typical of anhydrous carbonates!’, dom- 
inate the spectra of Occator bright material near 3.4,1m and 3.9 1m 
(Fig. 2). Hydrous carbonates have weak-to-absent 3.4-j1m and 3.9-j.m 
features but strong H,O bands!’, which are not found here. The car- 
bonate absorption centre positions of Occator spectra show that band 
centres agree with those of natrite (NaxCO3) (Extended Data Fig. 4) 
in contrast to Ceres’ average carbonate band centre, which is more 
consistent with dolomite CaMg(CO3)2. Moreover, the spectra of 
Occator bright material shows an absorption that extends to shorter 
wavelengths in the 3.2-3.5-jm range with respect to that of Mg, 
Ca, Fe and Na carbonates, indicating the possible presence of other 
mineralogical phases. Many overlapping overtones and combinations 
of fundamental modes of CO3”~ or X-H (for example, CH and OH) 
vibrations occur in the 3.2-3.5-j1m spectral range. Carbonaceous chon- 
drites show organic-related absorptions in the 3.3-3.5-|1m range, but 
these absorptions do not extend to the shorter wavelengths observed in 
Occator bright material (Extended Data Fig. 5). However, uncertainties 
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Figure 1 | Occator crater. a, Mosaic of Occator crater obtained with VIR 
data at 2j1m. The continuous colour from white to dark grey corresponds 
to reflectance from high (0.28) to low (0.03). b, Enlargement of the 
brightest central area. The coloured circled numbers correspond to the 
spectra in Fig. 2. c, Context image obtained with the Dawn Framing 
Camera during the High Altitude Mapping Orbits. d, Dawn Framing 
Camera image of the secondary bright areas acquired during the 375-km 
Low Altitude Mapping Orbits. 


in the instrument calibration in the 3.2-3.4-j1m spectral region call for 
caution when interpreting band shapes, and this part of the spectra will 
not be emphasized in this analysis. 

The absorption at around 2.2\1m can be due to a narrow hydroxide 
(OH) stretching and bending combination vibration in Al-phyllosilicates!” 
or to ammoniated minerals that also have bands at around 2.21 1m 
(refs 13-15), such as NH4Cl and NH4HCO; (Extended Data Fig. 6). The 
brightest regions in Occator (Fig. 3) are those where the 2.21-j1m and the 
3.9-\um bands are stronger and the 3.07-1m band is weaker, as shown 


by the spectra and correlation trends (Figs 2 and 3d and e). Where the 
ammonium phyllosilicate band is weak or absent, another ammonium- 
bearing mineral (2.21-\1m band) may be present, and the carbonate 
band at 3.9 1m is strongest. 

Using these phase identifications, we modelled the Occator brightest 
material with a mixture of sodium carbonate (Na2CO3), low-albedo 
material, illite or montmorillonite, and a small quantity of ammonium- 
bearing species, including NH4Cl or NH4sHCO; (Fig. 4). A common 
result of these mixture model fits is that carbonates are the most abundant 
species, (roughly 45-80 vol%, depending on the end-member spectra; 
see Extended Data Tables 1 and 2, and Methods). A small quantity 
of ammonium chlorides or bicarbonates and Al-phyllosilicate (a few 
volume per cent), is also needed in these models to account for the 
band at 2.21 jum and the overall spectral shape (Extended Data Fig. 7). 

Delivery of exogenous materials cannot account for the Occator 
bright areas, because the spectra are unlike other asteroids and comets. 
The morphologies of the bright areas (nearly circular shapes and cor- 
relation with fractures) argue against a direct impact origin, although 
the central mound of Occator and the presence along fractures 
(Fig. 1d) suggests that their emplacement may be related to the impact 
that formed Occator. However, it is unclear whether the bright material 
in Occator was formed from local aqueous processing triggered by the 
impact or if it represents an exposure of deeper material that found 
its way to the surface via fractures generated by that impact, or some 
combination of those two processes. In either case, these materials were 
derived from the interior of Ceres. 

The central bright area in Occator is especially rich in carbonate, 
and it appears to represent the most concentrated known occurrence 
of km-scale carbonates beyond the Earth. Carbonates also occur in 
carbonaceous chondrites but in amounts of only a few volume per cent. 
Calcite, aragonite and dolomite together constitute <3 vol% of CM'* 
and CI'’ chondrites. All these carbonates formed by aqueous alteration 
on their parent bodies within several Ma after formation of the solar 
system!®. Despite the complexity of the carbonate mineralogy in CM 
and CI chondrites, no natrite has been reported. 

On Ceres, the occurrence of ammonium salts, as well as the unex- 
pectedly high proportion of carbonates in Occator bright materials, 
points to a formation mechanism that is distinct from that which 
produces meteoritic carbonates. On Earth natrite is a magmatic 
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Figure 2 | Spectra of bright and dark areas in Occator. a, Reflectance 
spectra from the circled regions in Fig. la. Gaps correspond to removed 
instrumental artefacts or saturated channels. b—e, Continuum-removed 
spectra from selected wavelength regions. A 2.2-\1m absorption is visible in 
the spectra of the brightest pixels. The 2.7-j.m absorption shifts longward 
and an absorption at 2.87 jm appears, going from the crater floor to the 
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brightest pixels. The 3.05-j1m absorption weakens, while the 3.4-j1m and 
3.9-\.m absorptions strengthen in the brightest areas. The shaded region 
indicates the band positions and widths for anhydrous carbonates. Dashed 
lines indicate absorption positions in dark materials and solid lines 
indicate absorption positions in bright materials. 
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Figure 3 | Spatial distribution and scatter plots of different absorption 
intensities. a-c, The ratio between the reflectance at 3.6 1m and that at 
3.9m (a) provides a proxy of the 3.9-1m band depth, the ratio between 
the reflectance at 3.0 jm and that at 3.07 jum (b) provides a proxy for the 
3.07-jum band depth, and the ratio between the reflectance at 1.98 jum and 
that at 2.2 1m (c) provides a proxy of the 2.2-j1m band depth. The colour 


scale corresponds to the band depth (from blue at the minimum to yellow 
at the maximum). d, e, Scatter plots of band ratios show that the 3.9-jzm 
band strength is correlated with the 2.21-j1m band and anti-correlated with 
the 3.07-j1m band depth. The 2.2-j1m band is also anti-correlated with the 
3.07-m band. 
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Figure 4 | Spectral fits of the Occator bright material spectrum. Results 
of the spectral fitting model (red line) using: a, dark material, natrite, illite 
and ammonium chloride; b, dark material, natrite, illite and ammonium 
bicarbonate; c, dark material, natrite, montmorillonite and ammonium 
chloride; and d, dark material, natrite, montmorillonite and ammonium 
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bicarbonate. Values of the computed y’ for each mixture are given in the 
plots. Error bars for the Occator spectrum (black) are calculated taking 
into account a mean absolute deviation of the calibration uncertainties 
along the 256 samples. The end-members are described in Extended Data 
Table 1 and the retrieved abundances in Extended Data Table 2. 
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mineral in certain carbonatites, but this origin is improbable on Ceres. 
Alternatively, natrite is found in terrestrial alkaline hydrothermal and 
evaporitic environments, and it has also been detected in Enceladus’ 
plumes°, along with NaHCO; and NaCl. For the case of Cl-chondritic 
type bodies, models for ammonia-free water-chondrite interaction!?° 
suggest that Na-carbonate may be formed, but that the abundance of the 
latter is dominated by that of NaCl. In this respect, we note that NaCl is 
featureless in the spectral range observed by VIR, but that if present in 
the bright deposits of Occator, its abundance is small compared to that 
of carbonate. For this reason, we consider the role of ammonium that is 
known to compensate the acidifying action of dissolved CO2, keeping 
pH high enough for the stability of carbonate or bi-carbonate ions in 
the fluids. Indeed, if sodium carbonate dominates sodium chloride in 
the deposits on Ceres, as suggested here, a chemical pathway similar to 
that used to produce industrial NazCO3 on Earth (the Solvay process) 
may be at work, where dissolved NaCl, CO2, and ammonia are used to 
precipitate sodium bicarbonate, producing ammonium chloride and/ 
or bicarbonate as possible reaction products. Support for this sort of 
chemical scenario on Ceres is provided by the ubiquitous presence of 
ammonium-bearing minerals on Ceres®, the presence of minor ammo- 
nium salts in Occator, and the fact that abundant carbon is expected 
on bodies that resemble carbonaceous chondrites in bulk composition. 

Occator is a fresh crater, possibly 100 million years or younger in 
age®, so the bright spots should be of similar or younger age. However, 
the source of modern fluids on Ceres remains an open question. When 
dissolved in water, NaHCO; (the precursor of NazCO3) has a eutectic 
point of ~267 K, while NH4Cl and NH4HCO; have eutectics at ~251K 
and ~256K, respectively (see Methods). These eutectic temperatures lie 
above the maximum temperature predicted for the outermost 100km 
or so of Ceres’ crust, in which case these materials should remain 
solid*!. The occurrence of carbonates and other salt assemblages in 
Occator may thus point to internal temperatures that are warmer than 
predicted by the models (for example, because of the low thermal con- 
ductivities of salts). Alternatively, an external heat source may have 
been involved, such as impact-induced heating. Indeed, estimates 
suggest that the impact that formed Occator could have increased the 
surface temperature up to that of the melting point of water ice’, hence 
encompassing the eutectics of the aforementioned species. 

The morphology of the bright areas indicates an association with frac- 
ture systems that may have facilitated brine upwelling (Fig. 1d). These 
fractures were either created by impacts, or they may be associated with 
subsequent internal movements. In either case, upon ascent and expo- 
sure the solute-bearing fluid containing entrained altered solids froze, 
causing precipitation and concentration of carbonates and salts. The 
detection of abundant sodium carbonate, albeit in localized regions at 
the surface of Ceres, provides constraints on Ceres’ chemical evolution 
and indicates that aqueous alkaline solutions could even persist in Ceres’ 
subsurface to the present day. In addition, these regions bear similarities 
with Enceladus, where ammonia, NaCl, NaHCO; and Na2CO; have been 
detected in plumes”’. These observations point to Ceres as an object that 
has experienced aqueous processes in the recent geological past involving 
materials similar to those predicted or observed on icy satellites>2023.24, 
confirming the link of Ceres with the bodies of the outer Solar System”. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Composition analysis. The spectra analysed here have been corrected for standard 
geometry (incidence angle 30°, emission angle 0°, phase angle 30°) to derive the 
values of reflectance. In particular, the central Occator bright spectrum reported in 
the main text is derived as an average of the four brightest spectra acquired during 
acquisition number 498468846 with the following viewing geometry: phase angle 
45°, incidence angle 18°, emission angle 41°. The resulting spectrum has also been 
corrected for thermal emission. 

We model Ceres’ average spectrum as an intimate mixture of different 
end-members, by means of Hapke theory”’, which characterizes light scattering 
in particulate media. The optical constants are derived from reflectance spectra 
as in ref. 26. Abundances and grain size of the components are free parameters, 
and a best fit is obtained by means of a least-squares optimization algorithm?”-”’. 
The model accounts for the viewing geometry (incidence, emission and phase 
angle) which is calculated according to the shape model, the spacecraft attitude, 
and the latitude and longitude of a given pixel on the surface. The model also 
accounts for the single particle phase function, which is a free parameter in the 
fitting procedure. 

We take advantage of the modelling performed for the overall surface of Ceres® 
as a starting point for the bright spot composition. A first fit is made by adding 
water ice to the end-members listed in Extended Data Table 1 (magnetite, antigorite, 
NH4-montmorillonite and dolomite). Although the model can roughly account 
for the increase in the depth of the complex 3-j1m band, this fit is unsatisfactory. 
Absorption bands of carbonates (3.5 1m, 4.0|1m) are poorly fitted; other minor 
absorption bands in the measured spectrum (2.2 1m, 2.9|1m) are absent in the 
model; conversely, some bands required by the model, like the 2.0-j1m absorption 
band of water ice, have not been observed. 

As discussed in the main text, the best fit is obtained by changing the kind of car- 
bonate and phyllosilicate with respect to the average composition of the surface” and 
adding ammonium chloride, or ammonium bicarbonate (Extended Data Table 1 
and 2). The components used to produce the best fit (Fig. 4) and the references for 
the spectra used for the analysis are listed in Extended Data Table 1. The model 
with water ice*?*? does not improve the fits. However, a small amount (1 vol%) 
is still compatible (Extended Data Table 2). The parameters obtained are listed in 
Extended Data Table 2. 

In the case of dark materials, the albedo is weakly linked to grain size, so grain 
size cannot be determined unequivocally. However, it can be constrained to be 
within the range 20-100 \1m, which is reasonable for surface regolith. For each 
combination of end-members, the best fit is repeated for the two grain size limits 
of the dark material. From the distribution of the results (Extended Data Table 2), 
one can evaluate the mean value and the uncertainties on the retrieved parameters. 
Formation of Occator’s bright area material. The abundant carbonate in Occator’s 
bright area requires (i) substantial amounts of carbon in solution, perhaps derived 
from the dissolution of organics, which are found in carbonaceous chondrites, and 
potentially augmented by CO/CO>-bearing ices that may have accompanied the 
nitrogen-bearing material from the outer Solar System’; (ii) substantial amounts 
of fluid with dilute carbonate and bicarbonate ions, and/or (iii) a concentration of 
carbonate at or near the surface by migration and evaporation of fluids near the 


LETTER 


surface of Ceres, similar to evaporates or ‘caliche in terrestrial soils. Ammonium 
may be present in the form of ammoniated salts from freezing of the early ocean in 
which the observed surface mineralogy formed, or it may be derived from brines 
interacting with the NH, present in typical Ceres surface materials”. 

Ammonium makes the solution basic, converting dissolved carbonate anions 
into bicarbonate anions, which then readily associate with ammonium cations. 
However, in the presence of dissolved NaCl, the ammonium preferentially com- 
bines with chlorine while sodium combines with bicarbonate. This assemblage can 
be exposed on the surface of Ceres, through fractures or cratering events, following 
emplacement and sublimation of the ice. 

The association of sodium carbonate and ammonium chloride (and/or ammo- 
nium carbonate) described in the Occator bright spots is thus consistent with the 
freezing or evaporation products of a salt solution bearing carbon species and 
ammonium ions. This process is similar to that employed for the synthesis of 
industrial sodium carbonate on Earth™, except that the saturation of the solution 
in CO, does not require the high-temperature breakdown of calcite or limestone 
if alternative sources of carbon are already present in solution. 

Code availability. We have opted not to make the code available for fitting the data 
because it is a numerical code developed specifically for this purpose, but the code 
is described in refs 27, 28 and 29. 
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Extended Data Figure 1 | Occator floor spectrum. Spectral fit (red) of the reflectance spectrum of the floor of the Occator crater (black) using the same 
end-members as discussed for the average surface. The computed \? for the mixture is given in the plot. Error bars for the Occator floor spectrum are 
calculated taking into account a mean absolute deviation of the calibration uncertainties along the 256 samples. 
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Extended Data Figure 2 | Spectral fit with hexahydrite. Modelled which are not observed in the Occator bright spectrum (black). Error 
spectrum (red) of a mixture of hexahydrite (MgSO4-6H2O) (30 vol%) with bars for the Occator spectrum are calculated taking into account a mean 
the average for Ceres (70 vol%). Strong absorptions bands due to H2O are absolute deviation of the calibration uncertainties along the 256 samples. 


visible at 1.4j1m, 1.95 jum, 2.45 1m and 31m in the modelled spectrum, 
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Extended Data Figure 3 | Spectral fit with water ice. Spectral fit (red) of absorptions predicted by the best-fit model are absent. The computed \? 


the reflectance spectrum of the bright spot in Occator (black) using the for the mixture is given in the plot. Error bars for the Occator spectrum are 
same end-members discussed for the average surface” and adding water ice _ calculated taking into account a mean absolute deviation of the calibration 
(Extended Data Table 1). Resulting parameters are reported in Extended uncertainties along the 256 samples. 


Data Table 2. Several absorptions present are poorly fitted, and several 
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Extended Data Figure 4 | Comparison of Occator bright material 
spectrum with carbonates. a, Continuum-removed spectrum of the 
Occator bright material compared to natrite!*, sodium bicarbonate 

(see http://psf.uwinnipeg.ca), calcite, and dolomite*>. b, A scatter plot 
of the longest-wavelength continuum-removed absorption band centres 
for different carbonates shows that Ceres data from Occator bright areas 
are similar to data from natrite and are distinct from data from other 
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parts of the planet, which plot near magnesite. The 3.9-j1m absorption is 
strong in both the Occator bright areas and in dark floor material. The 
3.4-1m absorption is strong in the bright areas but broader; its centre is 
challenging to define over most of the Ceres surface owing to the presence 
of other optically active phases. Spectral sampling of laboratory data 
varied, but in all cases was <0.01 1m. Error bars for the Occator spectrum 
are not reported in the plot. 
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Extended Data Figure 5 | Comparison of Occator bright material spectrum with carbonaceous chondrites. Comparison of spectra from two 
carbonate- and organic-bearing carbonaceous chondrites (MAC 02606 (CM2) and Ivuna (CI))*° and the Occator bright material. The spectra have 


been normalized to 1 at 2.62 1m. The Occator spectrum is not reported in the plot. 
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Extended Data Figure 6 | Spectra of candidate materials fitting the shaded red area indicates the expected bandwidth of NH4Cl (ref. 15). The 
observed 2.20-2.22-\1m absorption in Occator bright materials. Occator spectrum is 20x contrast-enhanced. The magnesium-exchanged 
The shaded grey area indicates the expected bandwidth of aluminium montmorillonite is heated to 300°C (ref. 37). Spectra of ammoniated salts 
phyllosilicates*® (see the PSF web site at http://psf.uwinnipeg.ca and the NH4Cl, (NH4)2CO3 and NH4HCOs (ref. 15) are also plotted. Dotted lines 
RELAB database at http://www.planetary.brown.edu/relab/), and the correspond to ammonium absorptions near 2 1m. 
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Extended Data Figure 7 | Spectral fit without NH, salt. Spectral fit (red) 
of reflectance spectrum of the bright spot in Occator (black) using the 
same end-members as discussed for the natrite, montmorillonite, illite and 
dark material (Extended Data Table 1). Resulting parameters are reported 


in Extended Data Table 2. The computed \? for the mixture is given in 
the plot. Error bars for the Occator spectrum are calculated taking into 
account a mean absolute deviation of the calibration uncertainties along 
the 256 samples. 
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Extended Data Table 1 | End-members used in evaluating mixing model results 


Mineral’ Sample ID Spectrum ID 
Murchison IOM OG-CMA-002 BKR10G002 
natrite (1) CB-EAC-034-C LACB34C 
natrite (2) CB-EAC-079-A BKRICBO079A 
illite IL-EAC-001 LAILO1 
montmorillonite EA-EAC-028-A BKRIEA028A 
NHC CL-EAC-049-A LACL49A 
NH, HCO; CB-EAC-041-B LACB41B 
NH,-montmorillonite JB-JLB-189 397F 189 
dolomite CB-EAC-003 LACB03A 
hexahydrite SF-EAC-057A LASF57A 
water ice 


1Spectra have been selected from the RELAB database (http://www.planetary.brown.edu/relab/). Natrite (1) and (2) are both used in the fitting procedure. IOM, insoluble organic matter. 
Optical constants for water ice are from refs 30-33. 
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Extended Data Table 2 | Combination of end-members used to produce the best fit 


Endmembers Cross Grain Volume x2 Figure 
section | size (um) (%) 
(%) 

water ice 7-8 
calcite 100 - 100 7.01 - 6.05 Fig. 3 
NHa-montmorillonite 50 - 50 extended 
antigorite 100 - 100 data 
dark material 20 - 100 
natrite 54 - 49 Fig. 7 
montmorillonite 100 - 100 8.25-7.24 | extended 
illite 5-5 data 
dark material 20 - 100 
natrite 57 - 63 
ammonium bicarbonate 59-35 1.72 - 1.59 Fig. 4 
montmorillonite 5-5 main text 
dark material 20 - 100 
natrite 48-51 
ammonium bicarbonate 74-47 1.37 - 1.20 Fig. 4 
illite 8-8 main text 
dark material 20 - 100 
natrite 83 - 84 
ammonium chloride 26-15 1.63 - 1.45 Fig. 4 
montmorillonite 3-5 main text 
dark material 20 - 100 
natrite 
ammonium chloride 1.46 - 1.11 Fig. 4 
illite main text 
dark material 
water ice 
natrite 1.36 - 1.05 
ammonium chloride 
illite 
dark material 


The cross-section fraction and the grain size are free to vary in order to obtain the best fit. For each combination we obtain two different solutions related to the grain size of the dark material 
(20m and 100m), which is the only parameter fixed in the fitting procedure. The volume fraction is obtained according to the cross-section and the grain size of each end-member. 
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Controlling charge quantization with quantum 


fluctuations 


S. Jezouin!*, Z. Iftikhar!*, A. Anthore!, FE. D. Parmentier!, U. Gennser!, A. Cavannal, A. Ouerghi', I. P. Levkivskyi’, E. Idrisov’, 


E. V. Sukhorukov’, L. I. Glazman‘ & F. Pierre! 


In 1909, Millikan showed that the charge of electrically isolated 
systems is quantized in units of the elementary electron charge e. 
Today, the persistence of charge quantization in small, weakly 
connected conductors allows for circuits in which single electrons 
are manipulated, with applications in, for example, metrology, 
detectors and thermometry! >. However, as the connection 
strength is increased, the discreteness of charge is progressively 
reduced by quantum fluctuations. Here we report the full quantum 
control and characterization of charge quantization. By using 
semiconductor-based tunable elemental conduction channels to 
connect a micrometre-scale metallic island to a circuit, we explore 
the complete evolution of charge quantization while scanning the 
entire range of connection strengths, from a very weak (tunnel) 
to a perfect (ballistic) contact. We observe, when approaching the 
ballistic limit, that charge quantization is destroyed by quantum 
fluctuations, and scales as the square root of the residual probability 
for an electron to be reflected across the quantum channel; this 
scaling also applies beyond the different regimes of connection 
strength currently accessible to theory®*. At increased temperatures, 
the thermal fluctuations result in an exponential suppression of 
charge quantization and in a universal square-root scaling, valid 
for all connection strengths, in agreement with expectations’. 
Besides being pertinent for the improvement of single-electron 
circuits and their applications, and for the metal-semiconductor 
hybrids relevant to topological quantum computing’, knowledge 
of the quantum laws of electricity will be essential for the quantum 
engineering of future nanoelectronic devices. 

Some of the most fundamental theoretical predictions have so far 
eluded experimental confirmation. Charging effects are generally found 
to diminish as the conductances of the contacts are increased!°-!8, 
however, although some measurements support the fundamental 
prediction®* that charge quantization vanishes in the presence 
of one ballistic channel’*-'*', others conclude the opposite!**?. 
Unsurprisingly, the scaling behaviour predicted for the reduction of 
charge quantization®® has also remained elusive, until now, despite 
several attempts!©!”, 

A plausible explanation of the varying results regarding the charge 
quantization criteria is that, in the previously investigated devices, the 
quantum channels and the conductor were not completely distinct 
circuit elements. With a small island, in which the density of states is 
discrete, the non-local electronic wave functions merge the connected 
channels and the island into a complex quantum conductor, where 
Coulomb interactions may play a non-trivial role. As a result, charging 
effects can develop even if one of the conduction channels taken 
separately is perfectly ballistic. This phenomenon is called mesoscopic 
Coulomb blockade!®??4, 

Investigating charge quantization at the most elemental single- 
channel level therefore requires tunable conduction channels linked 


to a conductor with a negligible electronic level spacing. Although this 
can be realized by making the island larger, its size must remain small 
enough to preserve charge quantization. Indeed, thermal fluctuations 
average out charge quantization unless the charging energy associated 
with the addition of one electron in the island—Ec = e”/2C, where the 
geometrical capacitance of the island C increases with size—is larger 
than the thermal energy kgT, with kg the Boltzmann constant and 
T the temperature!. 

We have solved these conflicting requirements with the hybrid 
metal-semiconductor single-electron transistor (SET) shown in 
Fig. 1a, implementing the schematic circuit of Fig. 1b: a central metallic 
island with a continuous density of states (coloured red in Fig. 1a, b) is 
connected to large electrodes (represented by white disks) through two 
Ga(Al)As quantum point contacts (QPC_,p) that emulate single-channel 
quantum conductors over the entire range of coupling strengths. 

The metallic island, which is made of a metallic AuGeNi alloy, has 
a negligible electronic level spacing 6+ kg x 0.2 UK, five orders of 
magnitude smaller than the base electronic temperature T + 17 mK. 
It is galvanically connected, by thermal annealing, to a 105-nm-deep, 
Ga(Al)As, high-mobility two-dimensional electron gas (2DEG; darker 
grey areas delimited by bright lines in Fig. 1a). Achieving an almost 
perfectly transparent metal-2DEG electrical contact is crucial to reach 
the ballistic channel limit. Remarkably, the reflection probability of 
electrons at the interface is below 0.05%. 

The QPCs are located in the 2DEG and tuned by field effect with the 
voltage applied to capacitively coupled metallic split gates (coloured 
green in Fig. 1a; the top-right split gates that are coloured yellow are 
negatively biased to remove the 2DEG underneath). Besides tuning, 
the precise characterization of each QPC, independently, is necessary 
for the quantitative exploration of charge quantization versus 
connection strength. However, in the SET configuration, the QPC 
conductances are interconnected and renormalized by Coulomb 
blockade. Moreover, only their series combination is accessible. To 
completely characterize QPC;,r, we implemented with adjacent gates 
(coloured blue in Fig. 1a) the on-chip switches shown in Fig. 1b. The 
measured quantities 7p = GERh /e* (with h the Planck constant and 
GPR the conductances of QPC;,p when switches are closed (inset of 
Fig. 1c)) directly give the ‘intrinsic (not renormalized by Coulomb 
blockade) electron transmission probabilities of the constitutive 
quantum channels, which fully characterize the connection strength to 
the metallic island. As illustrated in Fig. 1c, Tig) < 1 corresponds to a 
single (spin-polarized, see below) channel of transmission probability 
TLR) across QPCj (gy. For 1 < Tp <2, there are two channels across 
QPCr—one fully ballistic and the other with transmission probability 
Tr — 1. With this approach, we achieve an accuracy down to 0.1% near 
the ballistic limit. 

The sample is immersed into a perpendicular magnetic field 
BAT, which corresponds to the integer quantum Hall effect at filling 
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Figure 1 | Tunable quantum connection to a metallic island. a, Coloured 
sample micrograph. A micrometre-scale metallic island (red) is connected 
to large electrodes (white circles) through two quantum point contacts 
(QPCs, green split gates) formed in a buried two-dimensional electron gas 
(2DEG; darker grey delimited by bright lines). The lateral gates (blue) 
implement short-circuit switches as shown in b. The yellow gates, tuned 

at V, negative enough to deplete the 2DEG underneath, are capacitively 
coupled to the island and used to evidence charge quantization. In the 
applied field B ~ 4 T, the current propagates along two edge channels 

(red lines) in the direction indicated by arrows. b, Sample schematic; 
colours as in a; Q represents the excess charge that can accumulate on the 


factor v=2. In this regime, the electrical current propagates along two 
edge channels (shown as a single red line in Fig. 1a) in the direction 
indicated by arrows, which does not influence charge quantization (for 
a specific discussion see Methods section ‘Conductance in the near-bal- 
listic regime with strong thermal fluctuations’). The large Zeeman split- 
ting results in the full separation between the successive openings of 
the two spin-polarized quantum channels across the QPCs (Fig. 1c). 

Charge quantization in the central island is unequivocally evidenced 
by periodic oscillations of the SET differential conductance Ggpr (across 
QPC,-island-QPCR) when sweeping a capacitively coupled gate 
voltage, which develop into Coulomb diamonds with d.c. bias voltage 
Vac (Fig. 1d). With both QPCs in the tunnel regime, 7,,p<1, the span 
of the diamonds in V4, gives the charging energy Ec= kg x 0.3K 
(Cx 3.1 fF). 

We first probe the evolution of charge quantization with transmission 
probability directly from Gsgr raw periodic modulations. Figure 2a 
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metallic island. c, The ‘intrinsic’ (switch closed; see inset schematic) 
conductance GiP, across QPC,.x (shown top-right and bottom left, 
respectively, in a) is shown versus split gate voltage V7, as black (L) and 
red (R) lines. Symbols indicate the set-points of QPC, used thereafter. 
The number and transmission probabilities of electronic channels through 
the QPC (pair of green triangles) are schematized for Tp < land Tg > 1: 
a dashed (solid) red line represents a partially (perfectly) transmitted 
channel. d, Coulomb diamond patterns in the device conductance Gsgr 
(larger shown brighter, from 0 in dark blue up to 0.13e7/h in white) 
measured versus gate (V.w) and bias ( Va.) voltages for tunnel contacts 
TLR <i, 


0.3 


displays Gspr measured at T+ 17 mK and Vq,= 0 while sweeping 
the capacitively coupled gate voltage V, (Fig. 1a), for QPC, fixed to 
T_ = 0.24 and with each panel corresponding to a different QPC, tuning 
(TR=0.1, 0.6, 0.88, 0.98 and 1.5, from left to right). These raw data 
reveal the remarkable robustness of charge quantization to connec- 
tion strength. At fz =0.1 and Tg=0.6, the presence of sharp periodic 
peaks separated by intervals in which Ggpr~0 signals an essentially 
unaltered charge quantization over the greater part of transmission 
probabilities. Although Ggpr(6V¢) progressively evolves with increas- 
ing Tp < 1 into a sinusoid with non-zero minima, relatively important 
modulations of fixed (Tg-independent) period persist very close to the 
ballistic limit, at Tg = 0.98. In stark contrast, Gspr is independent of 
V, at Tp= 1.5, confirming the predicted complete collapse of charge 
quantization in the presence of a fully ballistic channel. Note that Gspr 
remains reduced by Coulomb interactions, even at Tg = 1.5, as evi- 
denced by the pronounced conductance dip at low Vac (inset of Fig. 2b). 
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Figure 2 | Charge quantization versus connection strength at 

T  17mK. a, Conductance sweeps Gsgr(5Vz) with fixed 7, = 0.24 and 
varying Tr = 0.1, 0.6, 0.88, 0.98 and 1.5, from left to right, as indicated. 

b, Visibility of Gger oscillations AQ = (Geet — Ger) / (Geer + Goer) versus 
Tr, With each set of symbols corresponding to a different QPC, set-point 
TL, as indicated, corresponding to those indicated by the matching symbols 
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Figure 3 | Charge quantization scaling near the ballistic critical point. 
The AQ data at T + 17 mK are displayed versus 1 — Tg on a log-log scale, 
with different symbols for the different QPC, set-points, as in Fig. 2. Solid 
lines are quantitative predictions (no fit parameters) derived assuming 
kpT<Ec, 1 — TR 1 and either 7, < 1 (top (black) line) or 1 — <1 
(bottom three (purple, green and orange) lines). The power law 
AQ« ./1 = 7p (straight, dashed lines) is systematically observed for 
1 — Tg 0.02 and at intermediate 7,. The horizontal error bars arise from 
the dispersion of at least 40 transmission settings; the vertical error bars 
are calculated from the statistical uncertainty of about 10 measurements 
of one period of Gspr (see Methods). 
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in Fig. 1c. Inset, dynamical Coulomb blockade renormalization of Gspr 
versus d.c. voltage Vy. in the absence of charge quantization, at 7; =0.24 
and Tp= 1.5. The error bars are the standard error on the mean value of 
AQ, obtained from the statistical uncertainty of about ten measurements 
of Gspr (see Methods). 


Indeed, the so-called dynamical Coulomb blockade does not rely on 
a quantized island charge, but results from the discreteness of charge 
transfers across non-ballistic channels!”. 

The degree of charge quantization versus connection strength is 
characterized, separately from the dynamical Coulomb blockade renor- 
malization of the channels, by focusing on the visibility of the periodic 
modulations AQ = (Gaia — GBP) /(GEX + GER), with G@*"™™) the 
maximum (minimum) SET conductance over one gate-voltage period 
and, from now on, Va. =0. A visibility AQ= 1 (AQ=0) signals a full 
(an absence of) charge quantization. Moreover, the visibility AQ is 
directly proportional to the charge oscillations of the island with gate 
voltage (that is, charge quantization) when one channel approaches the 
ballistic limit (for example, Tz — 1)”?°-?7. As put forward in ref. 26, this 
proportionality coefficient reduces to the numerical factor e/(2 x 1.59) 
for T,<land kgT<Ec. 

Figure 2b shows AQ versus Tx at T~ 17 mK, with each set of symbols 
corresponding to a different tuning of the second QPC (7, € {0.075, 
0.24, 0.49, 0.75, 0.975, 0.983}). The robustness of charge quantiza- 
tion with the connection strength of one channel (Tx) is established 
independently of the second channel (7), from the nearly constant 
AQ for Tg < 0.6. When further increasing Tr, AQ noticeably 
diminishes and systematically collapses to zero precisely at the ballistic 
critical point Tp = 1. For Tp > 1, in the presence of one ballistic channel, 
AQ remains perfectly null at experimental accuracy (see Methods for 
additional tests). 

Power laws characterizing the scaling of charge quantization as 
Tr — 1 are best revealed by plotting AQ versus the ‘distance’ from the 
ballistic critical point 1 — Tp > 0 on a log-log scale. As shown in 
Fig. 3, the T= 17 mK data (symbols) systematically vanish as ./ 1 — 7 
(straight lines) for 1 — Tp < 0.02. 
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Figure 4 | Crossover to a universal charge quantization scaling as 
temperature is increased. a, Symbols display AQ versus 1 — TR at 
T,_=0.75 and for T + 17 mK, 32 mK, 47 mK, 82 mK, 119mK and 166mK, 
from top to bottom. The Tp range over which AQ x ./1 — 7p (straight, 
dashed lines) extends up to the full interval Tp € [0, 1] when increasing T. 
b, The rescaled AQ/,/1 — 7, is shown versus 1 — Tp, with a different set of 
symbols corresponding to different QPC; set-points as in c. Solid lines 
separate the data at T ~ 17 mK (top, darker filling), T ~ 47 mK (middle) 
and T + 82 mK (bottom, brighter filling). At T= 82 mK, all the data 


collapse onto a single universal curve AQ x ./(1 — 71,)(1 — TR). ¢ Symbols 


The Coulomb blockade theory of electronic transport in the presence 
of a nearly ballistic channel (1 — Tg 1) relies on the bosonization 
approach that was initially developed to address correlated electrons 
in one dimension. Quantitative predictions were obtained for kg <Ec 
and for a second channel in either the tunnel (7,.<1) or almost-bal- 
listic (1 — 7,1) regime”>”8. In both cases, AQ is expected to vanish as 


aj l — 7p: 


AQ(IL-R&L1< LkgT< Ec) &5.7./1 —7R (1) 


kgT 


AQ 1-Tpr a, ~~ (1 rialel (2) 
Cc 


TR) 


0.57Ec 
kgT 
Such a scaling, initially proposed in ref. 6, was also predicted for 
the gate-voltage modulation of thermodynamic quantities for 
multi-channel junctions using an extension® of the instanton 
technique’”’. 

The data establishes the ./1 — 7, scaling for arbitrary 7, € [0, 1], 
beyond the tunnel and ballistic limits currently accessible to transport 
theory. The dashed lines in Fig. 3 display the asymptotic 
(./1 — tr <kgT/Ec), quantitative predictions of equation (2) for our 
completely characterized device at T= 17 mK, without fitting 
parameters. The non-asymptotic AQ predictions (equation (1) for 
TL<1; see Methods for 1 — tT, <1) are shown versus 1 — Tp < 0.25 as 
solid lines. Data and quantitative predictions are indistinguishable 
for 1 — Tg <0.1 for 7, = 0.983, T, = 0.975 and, more surprisingly, 
T_=0.75. The equation (1) prediction (black line in Fig. 3) remains 
noticeably (about 25%) above the 7, =0.075 data for 1 — Tp<1. This 
numerical difference could result from the finite experimental T, 
because equation (1) is exact only at T=0. 

We now investigate the ways in which the combination of thermal 
and quantum fluctuations impacts the quantization of charge. As 
temperature rises, the population of additional charge states is expected 
to average out charge quantization!”. Figure 4a displays the measured 
AQ (symbols) versus 1 — 7p at different temperatures, from T= 17 mK 
(darker filling) to T= 166 mK (brighter filling), for the representative 
QPC, setting rT, = 0.75. As naively expected, AQ decreases as 
T increases. In line with thermodynamic expectations® (Methods), the 
AQ« 4/1 — 7 scaling (straight lines) that originates from quantum 


display the fully rescaled data AQ/,/(1 — 7)(1 — 7p) versus T on semi-log 


scale, extracted in the regime in which 1 — 7p is small enough that 

AQ «x ./1 — 7p; data for 7, = 0.975 are plotted only for T < 47 mK. 
Horizontal error bars represent the experimental temperature uncertainty 
at T=17 + 4mK and T=32 + 1 mK. Solid lines are the quantitative 
predictions in the quantum regime kgT < Ec, given by equation (1) (black, 
horizontal) and equation (2) (green, curved). The straight dashed line 
displays an exponential decay close to predictions in the presence of strong 
thermal fluctuations (see text). 


fluctuations not only persists for increasing T, but extends over a 
widening range of 7g, up to the full-scale Tz € [0, 1]. 

The crossover towards this universal behaviour is established by 
comparing the rescaled visibility AQ/./1 — 7, for different 7, settings 
with 1 — Tg. The symbols in Fig. 4b represent the rescaled data at 
T=17mK, T=47 mK and T=82 mK, with brighter filling at higher 
temperatures. As T increases, the scatter associated with the various 71, 
values narrows. For T > 82 mK, the rescaled data collapse onto a single, 


universal (for all 71), straight line AQ x ./(1 — %)(1 — 7) over the 
full range 7,,p € [0, 1]. 


The temperature dependence is further characterized by plotting 


AQ/./(1—71)(1 — 7g) (determined at low enough 1 — Tp such that 


AQ «x ./1—7, ) versus temperature on a semi-log scale (Fig. 4c, 
symbols). The kg T< Ec prediction of equation (1) (equation (2)) is 
displayed as a black (green) solid line for T < 75 mK (T < 115mK). We 
find for T > 82 mK (up to 166 mK, 2.8 < 17kgT/Ec < 5.6) that the 
different 7, data points collapse onto the same exponential decay 
(dashed line in Fig. 4c): AQ = 4/(1 — 7.,)(1 — 7) exp(—0.807kgT /Ec). 
We have extended the Coulomb blockade theory for the conductance 
to include thermal fluctuations in the limits of tunnel or nearly ballistic 
channels (Methods). In the regime of strong thermal averaging, 


we predict AQ«x «/(1 — %)(1 — 7) exp(— 17kgT/Ec) (neglecting 
factors not exponential in T)—a dependence that is also expected for 
thermodynamic properties® (Methods)—in close agreement with the 
experimental findings regarding the effect of 7,,n and T. 

Although theoretical predictions for low-temperature transport 
currently apply to only the nearly ballistic and tunnel limits, we anticipate 
that recent advances, including those in numerical renormalization 
group methods”, will open up access to the full range of connection 
strengths. Our results may therefore provide a test-bed for strongly 
correlated electron-theoretical methods, for which non-perturbative 
techniques are ubiquitous. The understanding and on-demand control 
of charge quantization in mesoscopic circuits might lead to applica- 
tions beyond the field of single electronics. The central role of charge 
quantization in the different quantum laws of electricity with coherent 
conductors indicates that direct quantum engineering could have impli- 
cations for future nanoelectronics, such as semiconductor—metal hybrid 
devices that are crucial for developing topologically protected quantum 
bits®. The hybrid implementation we have presented also enables further 
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fundamental exploration, including of charge quantization with corre- 
lated electrons such as in the multi-channel Kondo regime and/or with 
fractionally charged anyonic quasiparticles. 
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METHODS 

Sample. The sample is nanostructured by standard e-beam lithography in a GaAs/ 
Ga(Al)As 2DEG located 105 nm below the surface, of density 2.5 x 10!!cm~* and 
mobility 10°cm? V~! s~!. The ohmic contact between the micrometre-scale 
metallic island and the buried 2DEG is obtained by thermal diffusion into the semi- 
conductor of a metallic multilayer of nickel (30 nm), gold (120nm) and germanium 
(60 nm); see, for example, ref. 31. See methods in ref. 32 for the estimation of the 
typical energy spacing between electronic levels in the central metallic island on 
the same sample. 

Experimental set-up. The measurements were performed in a dilution refrigerator 
including multiple filters along the electrical lines and two shields at the mixing 
chamber. Conductance measurements were carried out by standard lock-in 
techniques at low frequencies, below 100 Hz, taking advantage of the chiral 
current propagation in the quantum Hall regime (see Extended Data Fig. 1). Noise 
measurements for the electronic temperature were performed in the megahertz 
range using a homemade cryogenic amplifier (for details, see the supplementary 
information of ref. 33). 

Electronic temperature. The displayed electronic temperatures correspond to 
those extracted on-chip using either quantum shot noise primary thermometry** 
or thermal noise thermometry, with error bars encapsulating also the outcome 
of Coulomb blockade oscillations primary thermometry (at T< 32 mK) and/or 
standard thermometry from RuO) resistors thermally anchored to the mixing 
chamber (at T > 32 mk). 

Interface between the metallic island and the 2DEG. A 2DEG-metallic island 
transmission probability To_our > 0.9995 is obtained with the self-calibrated 
procedure described below. Here, the switches are set in open positions as in 
Fig. 1b (with edge channels following the red lines shown Fig. 1a and Extended 
Data Fig. 1). First, QPC,,z are set at 7;,p= 1, in the middle of the very flat and broad 
intermediate plateau (owing to the robust quantum Hall effect), and we measure 
the reflected signal V2"! (see Extended Data Fig. 1). The average transmission 
probability To-out of the first (outer-edge quantum Hall) channel emitted from 
QPC, and QPC, into the metallic island then reads: 


Viet! = Gr(1 — Tr-out/4) VR 


with Va the (a.c.) voltage applied at the input of QPCR (see Extended Data Fig. 1) 
and Gp the gain of amplification chain R. Second, we eliminate calibration uncer- 
tainties by measuring the reflected signal var = GRVp with QPC,.R depleted 
(7LR= 0). The ratio ViuRt / VaR gives To-ou directly. With this approach, we 
obtain |1 — Te-out| <5 x 1074 (Te-out & 0.9997 + 0.0002). The same approach 
including also the second (inner-edge quantum Hall) channel gives Tq_ in 0.9976. 
Note that it is usual to have better ohmic contacts with the outer quantum Hall 
channels, which are closest to the sample edges. 

Short-circuit switch operation. In practice, closing the short-circuit switches is 
realized by changing the voltage applied to the adjacent characterization gate (blue 
in Fig. la, see Extended Data Fig. 2a for the conductance versus gate voltage of 
switch R) from —0.35 V (2DEG depleted/switch open) to 0.1 V (two edge channels 
perfectly transmitted/switch closed). 

Capacitive crosstalk corrections. The transmission probability across each QPC 
is slightly modified when changing the voltage applied either to its adjacent char- 
acterization gate or to the gate tuning the other QPC. Owing to the large, 
micrometre-scale distances, this modification remains relatively small, particularly 
near the ballistic critical point (<1% for T,,p € [0.9, 1] when changing the adjacent 
switch from closed to open). Let us first consider the crosstalk from one QPC to 
the other, which is more straightforward to extract. For this purpose, the charac- 
terization gate adjacent to the QPC, for which the crosstalk is to be compensated, 
is set to its short-circuit/closed position (as in Fig. 1c) such that the gate-voltage 
change that tunes the other QPC is felt only through the capacitive crosstalk. We 
find that this crosstalk can be precisely compensated by a relatively small shift 
(approximately —1%) of the split gate voltage. Regarding the capacitive crosstalk 
due to the adjacent characterization gate, the difficulty is to isolate this contribution 
from changes in the Coulomb blockade renormalization of the QPC conductance. 
To suppress this renormalization, the other QPC is set in the middle of its Tyg) = 1 
plateau and we apply a large d.c. bias voltage compared to the charging energy. 
Extended Data Fig. 2b displays the differential conductance of QPCR, measured 
in the presence of the applied bias Vp = 72 1 Vac versus gate voltage V#P*° for the 
adjacent switch set to position open (red line) and closed (blue line). The gate 
voltage shift AV that is needed to compensate the crosstalk is determined at low 
QPC conductances G#P° <0. 1e?/h, for which the d.c. voltage drop across the QPC 
is nearly independent of the switch position. Extended Data Fig. 2c displays as 
symbols the crosstalk compensation for QPC,x in response to increasing the 
adjacent characterization gate voltage from VY = —0.5 V. The amplitude of the 
negative crosstalk compensation is found to increase linearly, with different slopes 
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for different values of the switch conductance GX’. Indeed, the capacitive crosstalk 
depends on the precise paths of the edge channels, which screen the gate potentials. 
The crosstalk compensations used in the experiment when setting the adjacent 
switch from open to closed are AVP" = —6 mV for QPCg and AV#PS = —10 mV 
for QPC,. 

Calibrations. The reflected signal Vgr is normalized by the signal Vz4*~° 
measured when setting 7;,n= 0. The injection voltage and amplifier gain thereby 
cancel out in the expression of the SET conductance Ggpr: 


a 2e? Ver /V ER= 
Gser = a — Vrr/Ver ) 


To reduce the noise level, we also extract Gspr from the (redundant) transmitted 
signal Vp (see Extended Data Fig. 1): 
_ 2e? TLR=0 
Gser = —(Vir/Var® )Gr/Gr 
with Gp (G,) the gain of amplification chain R (L). The ratio Gp/G_ is determined 
by setting QPC;.z at T,,n= 1 and measuring both the signals reflected (VaRR=}) 
and transmitted VIER“): 


Ga_1- VEER? 


Fil npr = 1.0105 
GL Vines /VeR 


Experimental determination of AQ. For Tz < 0.99, the signal-to-noise ratio is always 
sufficient to accurately extract the values of Gg: directly from the periodic con- 
ductance maximums and minimums, which stand out very strongly from the back- 
ground noise. The error bars on the visibility AQ = (Gggt — Gout) /(Gank + Guin) 
were calculated from the statistical uncertainty on GgZ’™", which is typically 
estimated from ten different sweeps of one period. In this regime (Tp < 0.99), the 
calculated error bars are smaller than the size of the symbol and are therefore not 
shown. 

For Tz € [0.99, 0.998], although the periodic oscillations can still be clearly 
distinguished in the raw data (see Extended Data Fig. 3), the above direct procedure 
would result in uncertainties that can become quite large, especially at base tem- 
perature and in the presence of a weakly transmitted second channel (7, = 0.075). 
To improve our extraction of AQ, we take advantage of the observation that the 
conductance oscillations are sinusoidal for Tp > 0.98 (see Extended Data Fig. 3), as 
is expected from theory (see equations (3) and (6), and solid lines in Extended Data 
Fig. 3): the visibility of the conductance oscillations AQ is then extracted from a 
sinusoidal fit of the conductance sweeps Gggr(V,). The displayed error bars are the 
statistical error on the mean value obtained from the distinct AQ values obtained 
by separately fitting approximately six different conductance sweeps. The two pro- 
cedures give the same value of AQ in the intermediate regime Tp € [0.98, 0.99] for 
which they both accurately apply. 

For Tr > 1, there are no periodic oscillations directly visible in the raw conduct- 
ance sweeps Gsgr(V,) (see right panel in Fig. 2a). To put experimental bounds on 
the basic statement AQ 0, we determined the visibility AQ (displayed Fig. 2b) 
using the following procedure. First, we determine the most probable positions of 
the conductance maximums and minimums by ‘fitting’ a conductance sweep 
(typically extending over ten Coulomb oscillation periods) with a sinusoidal 
function at the known period of Coulomb oscillations, using its phase as a fitting 
parameter. For each of these positions, a different value of Gast or Gaur is obtained 
by averaging the data over an extension of one quarter of a period (assuming 
sinusoidal oscillations, this would result in a visibility reduction smaller than 10%). 
By separately extracting Gaz?" for the approximately ten periods, we calculate 
their mean values and estimate the corresponding standard errors. The error bars 
displayed Fig. 2b are the standard error on the mean value of AQ, obtained from 
the statistical uncertainty on Gage”. 

Predictions in the quantum asymmetric regime. In the quantum asymmetric regime 
(kgT < Eo T.<1, 1 — 711.<1), the conductance reads (equation (34) in ref. 28): 


GIESbl-tr<1 — 2n*(kpT)? 
SET h 3y2Ee 


x [1 — 27€.J1—TReos(2nbV,/A)] (3) 


with 7 exp(0.5772), €+ 1.59, A the gate-voltage period and 5V, the gate- 
voltage difference from charge degeneracy. In the ballistic limit (1 — Trp=0), 
the conductance does not depend on gate voltage, but vanishes as T? following 
quantitatively, with the exact same pre-factor, the dynamical Coulomb blockade 
predictions? for the same Ec and the corresponding series resistance R=h/e*. Using 
equation (3), the visibility of the oscillations of conductance reads: 


AQ(.<KL1—-mK1)=2€J/1— (4) 
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The temperature dependence of Gof (associated with dynamical 
Coulomb blockade) cancels out in ‘AQ. Charge discreteness also affects the 
gate-voltage dependence of thermodynamic quantities, such as the average charge 
({Q)) or the differential capacitance (Cyip= 0(Q)/OV,). The effect of Coulomb 
blockade on thermodynamic quantities was studied most comprehensively for 
tunnel junctions!”°: at T=0 and G> e’/h, the amplitude of average charge 
oscillations decays exponentially with Gh/e’; see, for example, refs 35-37. The 
theoretical extension to multi-channel junctions of arbitrary transmission, beyond 
the tunnel limit, was performed in ref. 8. In the presence of a single, nearly ballistic 
channel, the bosonization approach allows for an exact solution of the average 
charge in the metallic island in the low-energy ‘quantum’ regime kgT< Ec 
(equation (26) in ref. 7): 


TL<1,1-TRK1 


(qyestanrect = Se _ £1 PF Rsin(2nVe/A) + Qp 
wT 
with Qo a charge offset. In the ballistic limit (1 — Tg =0), the charge increases 
linearly with gate voltage, corresponding to an absence of charge quantization. The 
degree of charge quantization can be characterized by the relative amplitude of the 
oscillations of charge or, equivalently, by the visibility of the differential capacitance 
(Caitg = O(Q)/OV,) oscillations: 


min 

Coite — Cote 
max min 
Caite + Caite 


ACait(1@<K L1-er«K)= =2yJ1—7R (5) 


The degree of charge quantization vanishes as ,/1 — 7, when approaching the 
ballistic limit, and does not depend on temperature in the quantum regime 
(kgI'< Ec). Importantly, the visibility in the SET conductance oscillations is 
directly proportional to the visibility of the differential capacitance oscillations”°, 
up to the fixed numerical factor €= 1.59: 


AQ(< L1l-nr< 1) = EACaie (a < Ll1l-rRr< 1) 


Predictions in the quantum near-ballistic regime. In the quantum near-ballistic 
regime (kg T< Ec, 1 — TLr<1), the conductance Ggpy reads (equations (38) and 
(26) in ref. 25): 


2 cl [? h2 
Geer= <1 . </cosh?(x) : (6) 
2h (xn?kpT /YEc)? + I~ 
with y= exp(0.5772) and 
I_=(1—71) + (1—Tr) —2,/(1—71)(1 — TR) cos(2nbV,/ A) 


with A the gate-voltage period. The quantitative AQ predictions calculated with 
the maximum and minimum of Ggrr inferred from equation (6) are displayed as 
coloured solid lines in Fig. 3. When approaching the ballistic critical point 
(/1—TLR <kgT/Ec < 1), the visibility AQ reduces to the simple asymptotic 
expression (equation (2) with 7/7 + 0.57, reprinted here for convenience): 


kgT E 
TLR < B JYEc a 
Ec 


A 
Q tkpT 


<K1/= T1)(1 — TR) (7) 


The differential capacitance (Cig?) when one QPC approaches the ballistic critical 
point (7p — 1) reduces to the asymptotic expression (equation (41) in ref. 27): 


2n6V, is e 
A A 


Ge 4) In(1 Ti) (1—T1)(1 rade 


and the visibility in the oscillations of the differential capacitance reads: 


ACaip(1 — TRK1—TL«K 1) = —4yln(1 — 71)/(1 
We recover the same ./1 — 7 scaling behaviour near the ballistic critical point 
(TR= 1) that was found in the asymmetric regime (equations (4) and (5)), and 
which is also found in the visibility of the conductance Coulomb oscillations 
(equation (7)). For two identical (for example, spin-degenerate) channels 
(7 =7.=7Tp) near the ballistic critical point (1 - T< 1), the differential capacitance 


reads (equations (49) and (52) in ref. 7; a factor eA/(2Ec) was applied to match the 
definition Cyigg = 0(Q)/OV,): 
§ 
aii (1 — 7)sin? me + ket 
TA A Ec 


2n6V, e 
he B)4 = 
x( “co a |+ A 


T1)(1 TR) 


Cig T=1-TL=1-TRK<I1 


When approaching the ballistic critical point (7 — 1), the visibility in the oscillations 
of the differential capacitance asymptotically vanishes as 1 — 7, as in equation (7) 
with TL—TR- 

Predictions in the presence of strong thermal fluctuations. In the presence of 
strong thermal fluctuations, kgT> Ec/ wT, charge discreteness leads to periodic 
oscillations of the observables (for example, conductance and differential 
capacitance) while sweeping a capacitively coupled gate voltage. Quantum fluctu- 
ations decrease the oscillations, which are further attenuated by thermal 
fluctuations for increasing temperature, until the amplitude becomes exponentially 
small for kx T>> Ec/n. The exponential temperature dependence in kgT/Ec is 
quite robust, applying to thermodynamic!*” and transport (Methods) properties. 
It can be demonstrated in the limits of both small and large transmission 
probabilities of the conduction channels comprising the junctions, and for various 
models of the metallic island. The presence of thermal fluctuations not only 
preserves the quantum ./ 1 — 7 suppression of the oscillations, but it is expected 
from the results of ref. 8 that the square-root scaling of the differential capacitance 
extends with increasing temperature, up to the full range of T,,r € [0, 1]. The 
relative oscillations in the differential capacitance and in the conductance charac- 
terize the degree of charge quantization equally well, both following the same 
exp(—17kpT/Ec).| (1 —71.)(1 — 7) behaviour. Further information regarding 
the predictions and theoretical methods in the presence of strong thermal 
fluctuations are provided in the following four sections. 

Differential capacitance in the tunnel limit with strong thermal fluctuations. 
This regime corresponds to kgT >> Ec/x* and 7,,R<1. To start with, we evaluate 
the oscillatory part of the free energy of the island in the limit 7,,.n<1, where the 
suppression of charge quantization is entirely due to thermal fluctuations. 
Considering high temperatures, it is convenient to transform the partition function 
of the isolated island, 


3 exp| — 


n=—0o 


using the Poisson summation formula; the result is 


pu 


> exp( ~2rikd exp] "ET (8) 


Ec k=—00 Cc 


Here N = V,/A (with A the period in gate voltage V,) is the charge induced by 
the gate voltage in units of e, and the summations are performed over integers n 
and k. The k=0 and k= +1 terms in the sum in equation (8) yield, respectively, 
the leading V -independent and NV -dependent contributions Fy and 5F(V) to the 
free energy F= —kgTin(Z) at kgT > Ec/ 1. The resulting oscillatory part of the 
differential capacitance, 


1 
Ec 


2 
OF e gf Whol 
A A Ec 


Chast 2 4 
diff A 2 


wkpT 
en a fos(ana) (9) 


is exponentially suppressed at high temperatures. 

Differential capacitance in the near-ballistic regime with strong thermal 
fluctuations. This regime corresponds to kgT' >> Ec/n” and 1 — TR<1. A similar 
suppression of oscillations of the thermodynamic characteristics can also be 
demonstrated in the case of high-transmission junctions, where both thermal and 
quantum fluctuations contribute to the reduction of charge quantization. For 
definiteness, we consider here a single-junction case (7, =0) with 1 — Tp<1. 
Evaluation of Ci; °'~7®<! can be performed using the bosonization scheme 
developed in ref. 7. In that formalism, the NV’ -dependent part of the differential 
capacitance reads 


sc, se = io ee (0)]) 


where the bosonic quantum field (0) =2%Q/e corresponds to the charge Q 
passed through the junction (x=0), and D is the energy bandwidth appearing in 
the definition of boson variables. Averaging (...) is performed over the fluctuations 
of the field y(x). The Hamiltonian describing these fluctuations consists of two 
parts’, representing the energy of particle-hole excitations and the charging energy, 
respectively. The former part depends on (Vy) and the latter part has the form 
Eclp(0)/(2n)]*. Replacement of the ground-state averaging’ with an average over 
the Gibbs distribution of fluctuations, which is proportional to exp{—[Ec/(kgT)] 
[y(0)/(27)]}, results in the renormalization of the bandwidth D to a physically 
meaningful value of approximately kgT and in exponential suppression of the 
oscillations at kgT >> Ec/1?: 
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Because it follows from ref. 8, equation (10) is applicable in the full range of Tp for 
kgT >> Ec/n? (the numerical coefficient in equation (10) was established with the 
help of ref. 8). The identical exponential suppression for an almost-isolated island 
(equation (9)) is therefore simply equation (10) in the limit 7;< 1. In addition, 
quantum fluctuations contribute to the same suppression factor ./1 — Tp derived 
at 1 — Tp<1 in the quantum regime kgT'< Ec (equation (5)). Furthermore, 
equation (10) derived for kgT >> Ec matches the T=0 result of ref. 7 at kgT + Ec; 
however, given the large numerical factor 7 in the exponent of equation (10), there 
may be a broad crossover temperature region between the two limits. 
Conductance in the tunnel limit with strong thermal fluctuations. This regime 
corresponds to kgT' >> Ec/1” and 71,n<1. Turning now to conductance oscillations, 
we again start from the simpler case of low-transmission barriers (T,,p< 1). In that 
limit, the rate equation for current carried by spin-polarized electrons yields**: 


GEeES (NG T) 
ee? am Of expl—En(V)/(ksT)] 
h TL + TR n=-00 ZN, T) 


7 aoe 
kpT 


where f(x) =x/(1 — e~*). Application of the Poisson summation formula to the 
above equation is tedious, but straightforward. The result is an expression for 
Goes? that involves a sum of harmonics proportional to cos(2kN), similar to 
equation (8). The largest term, 


(11) 


does not oscillate and is simply the conductance of two resistors connected in 
series. The leading oscillatory term, 
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exhibits the same exponential suppression as the differential capacitance (equation 
(9)). 
Conductance in the near-ballistic regime with strong thermal fluctuations. This 
regime corresponds to kgT > Ec/n* and 1 — T<1. Regarding the conductance 
across a metallic island with high-transmission contacts, we (A) present a 
formalism that is somewhat different from ref. 25, details of which will be published 
separately (E.L., I.P.L. and E.V.S., manuscript in preparation) and (B) further 
establish the predictions by extending the formalism of ref. 25 to high temperatures. 

(A) In the first approach, we start from the chiral edge excitations of the integer 
quantum Hall regime, in close correspondence with the experimental configuration. 
Although we are interested in the high-temperature limit, all the energy scales in 
the experiment remain much smaller than the quantum Hall energy gap. At such 
low energies, the quantum Hall edge states may be described by the effective 
theory’. According to this theory, edge excitations can be viewed as bosonic edge 
magneto-plasmons. The corresponding one-dimensional charge density waves 
Pso(x) (s € {L, R}, a € {1, 2}; see Extended Data Fig. 4 for notations) verify the 
canonical commutation relations [p,.,(x), Psa] = (-1)° Wie7Sss/5ap6'(x — y)> 
where the sign accounts for the propagation direction of the chiral edge states, 6, 
and 6, are Kronecker delta functions, and 6’(x — y) is the derivative of the Dirac 
delta function. 

The Hamiltonian of the experimental set-up contains three terms: 
H=Ho + Hint + Hr. The first term describes the dynamics of the bare edge states: 


_ Mom sD fe sep? (x 


2e? 


where vz is the Fermi velocity of the quantum Hall edge states. The second term 
describes Coulomb interactions at the metallic island: 


Hint = Ec(Q/e — N)? (12) 


Q=Fterl0)—e(M=T]f asin) +f dep] 03) 


2 


The first equality in equation (13) defines the Bose field operators that are also used 
in the derivation of equation (10), but here for the case of two contacts. The last 
term in the Hamiltonian describes the backscattering of electrons at the two QPCs: 
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Hr = Ay + Ap + hic. 


As = 707 (0)%,2(0) (14) 


(15) 


’sq(0) = , (mm pm ‘f AX P50, (X) 


where the backscattering amplitudes 7p depend on the ‘intrinsic’ transmission 
probabilities 7,,z (in the near ballistic regime, 1 — Tica © |Yu|*/(Ave)”). 

We set the distance between the metallic island and the QPCs, which is much 
shorter than the wavelength of excitations in the experiment, to zero. We stress 
that exactly the same Hamiltonian arises in the absence of the quantum Hall 
effect, when applying the bosonization procedure to a metallic island connected 
to reservoirs through spin-polarized electron channels (as in refs 7 and 25). 
Consequently, the predictions below apply beyond the quantum Hall configuration 
used here as a starting point. 

Focusing on the near ballistic regime 1 — T,,n<1, we apply the scattering theory 
approach developed in refs 42 and 43. The average (I) = tr(pl) of the current 
operator I= vr[pri(0) — pr2(0)] is evaluated perturbatively in backscattering 
amplitudes (equation (14)). With this aim, we express the density matrix p= UppU' 
in terms of its equilibrium value po « exp[—(Ho + Hint)/(kgT)], and expand the 
evolution operator U= Texp|— 2ni/h f dtHy(t) (z)| in powers of y,, where Texp 


indicates the time-ordered exponential. This results in the two leading terms: 


+55 J de! de" (P(e) [Her(e"), 1) (16) 


0 
where the average is taken with respect to the equilibrium density matrix fo. 

The Hamiltonian 7H + Hint is quadratic in plasmon operators. Consequently, 
the corresponding dynamics can be accounted for exactly within the scatter- 
ing theory approach for bosons’. For instance, the scattering matrix for the 
interaction Hamiltonian Hin (ignoring the backscattering Hamiltonian Hy), which 
relates the currents in the incoming (L1, R2, L2, R1) and outgoing (L2, R1, L1, R2) 
channels at the frequency w/(27), reads: 


z 2 22 = 

1] z Zz —z 2-z 
S(w) == 17 
(w) Ii2—2 2 z z a7) 

=—2 2-2 2 Zz 


where z= 1/[ihw/(4Ec) + 1]. Taking the limit w — 0 (ref. 43), we determine the 
first term in equation (16): (I) 9 =e Va-/(2h). The bare conductance is thus half 
the conductance quantum. In the limit of small d.c. bias Va,, the second term can 
be rewritten as 


8(I) = fat(LAL(¢) + AR (2), AL(0) + Ar(0)]), 


This term contains the coherent contribution 


(Dose = akevi7e dt(p, (0, t)y,(0, t)wi,(0, 0) p9(0, 0) 6 (18) 
which oscillates as a function of the induced charge eN. 

In general, one can use the scattering matrix equation (17) to evaluate the 
average in equation (18), which leads to a complex expression (E.L, I.P.L. and E.VS., 
manuscript in preparation). However, the leading high-temperature asymptotics 
can be found using exactly the same argument as for the case of the differential 
capacitance considered above. Specifically, according to equation (15), the particu- 
lar value of the charge Q in the island leads to the phase shift of exp[27i(Q/e — V)] 
in the correlation function in equation (18). Therefore, by averaging the correlation 
function over instant fluctuations of this charge, which are distributed with the 
equilibrium Gibbs weights (proportional to exp[—(Q/e)*Ec/(kgT)]), we determine 
the high-temperature behaviour of the oscillating part of the current: 


2 
B(fow oe, a (@—7)0—7e) 
Q’Ec 
x J AQere| SEE eoslan( — Q/e)] 
x Hen| et (1—T1)(1—7TR) cos(27.NV) 
h Ec 


The validity of this simplified approach is confirmed by calculations (E.L, I.P.L. 
and E.V.S., manuscript in preparation). 
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(B) An alternative route of calculation amounts to re-working equation (A5) of 
ref. 25 for the case 1 — T,,.p<1 or equation (A27) for the asymmetric case T,< 1, 
1 — Tr<1. In either case, the largest term in the limit kgT> Ec/n? is, 
unsurprisingly, V -independent. Like equation (11), it represents the conductance 
of two junctions connected in series: Gx, e”/(2h) in the case of 1 — T,.p<1 and 
Gx © (e?/h)7, in the case 7,.<<1, 1 — TR<1. The leading oscillatory term in the 
former case is 


wkpT 
Ec 


2 
icf WT) = Sen | (1—7T1)(1— TR) cos(2nNV) 


In the asymmetric case, the factor ./(1 — 7,)(1 — 7) in the above expression is 
replaced by 7,,,/ 1 — 7x. The visibility of conductance oscillations now reads: 


wkpl 
Ec 


AQ~en| | 7) (1 — TR) 


This form correctly extrapolates between the symmetric and asymmetric cases. 
Conductance at T ~ 17 mK versus quantum regime predictions. Although the 
visibility AQ of the oscillations in the SET conductance best reflects the degree 
of charge quantization, we can also confront experiment and theory directly at 
the underlying conductance-sweeps level. In Extended Data Fig. 3, we compare 
Gsgr(5Vg) measurements (symbols) and predictions near the ballistic critical 
point (1 — Tp 0.02 and 0.004) with QPC, in both the tunnel (7, = 0.075) and 
almost perfectly transmitted (1 — 7, + 0.02) regimes. Solid lines are calculated 
with the electronic temperature T= 17 mK, using equation (3) for the top two 
panels (asymmetric regime, 7, = 0.075) and equation (6) for the bottom two panels 
(near ballistic regime, 7, = 0.983). The grey areas correspond to the experimental 
uncertainty of +4 mK. The demonstrated agreement validates the full prediction 
for the renormalized SET conductance. 

Charge quantization based on conductance or transmission probability values. 
Theory predicts that as soon as one conduction channel connected to the metallic 
island is ballistic, the charge in the island is completely unquantized. Here we show 
that charge quantization collapses systematically at the ballistic critical point Tp= 1, 
independent of the setting of the second channel (rT, < 1). We further demonstrate 
that the crucial ingredient is not the overall conductance, but the presence of a 
perfectly transmitted channel. For this purpose, we compare the two configurations 
displayed in Extended Data Fig. 5a, b. In both configurations, QPC, is tuned to the 
same standard setting corresponding to a single conduction channel of ‘intrinsic’ 
transmission probability 7, = 0.24. In both configurations, QPCg is set to the same 


overall intrinsic conductance GiP* = Tpe*/h=1.5e?/h. However, in the first 
configuration (Extended Data Fig. 5a) QPCr decomposes into one ballistic channel 
and one channel of intrinsic transmission probability 0.5, whereas in the second 
configuration (Extended Data Fig. 5b) it decomposes into two non-ballistic 
channels of intrinsic transmission probabilities 0.7 and 0.8. (In practice, the QPCR 
of the second configuration is realized using two different physical QPCs biased 
at the same voltage.) As shown Extended Data Fig. 5c, the SET conductance 
displays strong oscillations in the second configuration, signalling charge 
quantization in the absence of a ballistic channel. By contrast, the SET conductance 
in the first configuration does not depend on gate voltage, signalling a completely 
unquantized island charge in the presence of one ballistic channel. 
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Extended Data Figure 1 | Measurement schematic. The signal Vir (Var) from one QPC to the other is across the metallic island. The experiment 
is the voltage measured with amplification chain L (R) in response to the is performed in the quantum Hall regime at filling factor y= 2, where the 
injected voltage Vp. The trenches etched in the 2DEG, which can be seen current propagates along the edges in the direction indicated by arrows. 
in the form ofa ‘Y’ through the metallic island, ensure that the only way 
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Extended Data Figure 2 | Crosstalk compensation. a, (Intrinsic) open, owing to the added series resistance. Although this does not result in 
conductance GR’ across the characterization gate adjacent to QPC, versus a large error, because Gi depends weakly on voltage bias, this effect is 
gate voltage V}”. In the experiment, the left and right switches are minimized by extracting the crosstalk compensation AV§ at low 
independently set to the open and closed positions with VR, = —0.35V GP < 0.1e?/h. c, Symbols represent the crosstalk compensation AV, 
and VX’, =0.1V, respectively (vertical arrows in c). b, QPCp differential with respect to the gate voltage V’ = —0.5 V, versus Vj’. Lines are 
conductance in the presence of a d.c. bias of 72 V (‘72 Vac’) versus QPC linear fits of the crosstalk compensation at GR’ =0 (red, —2.8% relative 
gate voltage VW". The red and blue lines are measured with the adjacent compensation), 0 < GR’ < 2e?/h (green, —1.1% relative compensation) 
switch in the open and closed positions, respectively (see inset and Gy” =2e7/h (blue, — 1.4% relative compensation). 


schematics). The voltage drop across QPCg is smaller with the switch 
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Extended Data Figure 3 | Conductance measurements versus quantitative predictions. Direct Gszr(6V,) comparison at T ~ 17 mK between data 
(symbols) and predictions (solid lines, grey areas correspond to the temperature uncertainty of +4 mK) in the two limits addressed by theory 
(equation (3) for 7, ¥ 0 (top panels), equation (6) for 7, = 1 (bottom panels)). 
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Extended Data Figure 4 | Theoretical description of the experimental with amplitudes 7, and yp (equations (14) and (15)). The edge densities 
set-up in formalism (A) for strong thermal fluctuations. We consider enter into the interaction Hamiltonian (equation (12) through the total 
the regime of the quantum Hall effect, where only one spinless edge mode charge 6) of the metallic island (equation (13)). The average current (1) 
contributes to the transport. The corresponding edge states are described is calculated through a cross-section immediately to the right of QPCR 
by four charge density operators, labelled by s € {L, R} and a € {1, 2}. (vertical blue lines). 

These states are mixed (backscattered) at the two QPCs (red dashed lines) 
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Extended Data Figure 5 | Charge quantization based on conductance versus _to the same intrinsic conductance G#* = 1.5e?/h, which now decomposes 


transmission probability values. a, b, Schematics of the configurations, into two non-ballistic channels of intrinsic transmission probabilities 
both with the same QPC, setting 7, = 0.24. In the configuration shown ina, 0.7 and 0.8. c, Sweeps of the device conductance are plotted versus gate 
QPCp is set to an ‘intrinsic’ conductance GPP° = tTre?/h = 1.5e?/h, which voltage for the two configurations (a, red triangles; b, black squares). 
decomposes into one ballistic channel and one channel of intrinsic Conductance oscillations are visible only in the configuration shown in b, 
transmission probability 0.5. In the configuration shown in b, QPCg is set in the absence of a ballistic channel connected to the island. 
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Demonstration of a small programmable quantum 
computer with atomic qubits 


S. Debnath!, N. M. Linkel, C. Figgatt!, K. A. Landsman], K. Wright! & C. Monroe!?? 


Quantum computers can solve certain problems more efficiently 
than any possible conventional computer. Small quantum algorithms 
have been demonstrated on multiple quantum computing platforms, 
many specifically tailored in hardware to implement a particular 
algorithm or execute a limited number of computational paths". 
Here we demonstrate a five-qubit trapped-ion quantum computer 
that can be programmed in software to implement arbitrary 
quantum algorithms by executing any sequence of universal 
quantum logic gates. We compile algorithms into a fully connected 
set of gate operations that are native to the hardware and have a 
mean fidelity of 98 per cent. Reconfiguring these gate sequences 
provides the flexibility to implement a variety of algorithms without 
altering the hardware. As examples, we implement the Deutsch- 
Jozsa'! and Bernstein-Vazirani” algorithms with average success 
rates of 95 and 90 per cent, respectively. We also perform a coherent 
quantum Fourier transform!**"‘ on five trapped-ion qubits for phase 
estimation and period finding with average fidelities of 62 and 
84 per cent, respectively. This small quantum computer can be 
scaled to larger numbers of qubits within a single register, and can 
be further expanded by connecting several such modules through 
ion shuttling’ or photonic quantum channels!®. 

Implementing a scalable programmable quantum computing archi- 
tecture requires high-fidelity initialization and detection at the indi- 
vidual qubit level and pristine control of interactions between qubits. 
Whereas most physical platforms have nearest-neighbour interactions 
only, a multi-qubit trapped-ion system features an intrinsic long-range 
interaction that is optically gated and connects any pair of qubits!”"*. 
Unlike solid-state implementations’®”°, the quantum circuitry is deter- 
mined by external fields, and hence can be programmed and recon- 
figured without altering the structure of the qubits themselves. By 
optically resolving individual ions, we implement single-qubit rotations 
and arbitrary two-qubit gates by directly addressing pairs of ions with- 
out additional overhead such as moving information through local 
couplings!””! or hiding qubit populations in additional auxiliary 
states!°. Such native gates can then be used to construct modular 
logic gates that can be called in reconfigurable algorithm sequences. 
We observe a mean fidelity of 98% in these native operations without 
the use of spin echo or dynamical decoupling techniques!°”””*. This 
bottom-up approach can be adapted for large-scale computation using 
micro-fabricated ion traps with integrated optics** and high optical 
access, and we expect that gate fidelities could exceed 99.9% with 
straightforward improvements to the classical control”*”?. 

The programmable and reconfigurable nature of the trapped-ion 
quantum computer is illustrated by a hierarchy of operations from 
software to hardware, shown in Fig. la. At the top is a high-level user 
interface that specifies the desired algorithm, represented by a stand- 
ard family of modular universal logic gates such as Hadamard (H), 
controlled-NOT (CNOT), and controlled-phase (CP) gates'4. Next, 
a quantum compiler translates the universal gates into gates native to 
the hardware, which in our case are two-qubit Ising (XX) gates’ and 


single-qubit rotation (R) gates'. Finally, these native gates are decom- 
posed into laser pulses that are pre-calculated to effect the desired qubit 
operation through the Coulomb-coupled motion while disentangling 
the motion at the end of the gates”®. 

At the hardware level, the processor consists of trapped !7'Ybt 
atomic ion qubits with information stored in the hyperfine ‘clock states 
|0) =|F=0; mp=0) and |1) =|F=1; mp=0) of the *S/2 electronic 
ground level with a qubit frequency splitting of vp = 12.642821 GHz 
(ref. 27). Here, Fand my denote the quantum numbers associated with 
the total atomic angular momentum and its projection along the quan- 
tization axis defined by an applied magnetic field of 5.2 G. We measure 
a qubit coherence time in excess of 0.5, and with magnetic shielding 
we expect this to improve dramatically (ref. 28). 

We confine the ions in a linear radio-frequency Paul trap, with radial 
and axial trap frequencies v,= 3.07 MHz and v, = 0.27 MHz, respec- 
tively. The ions are laser-cooled to near their radial motional ground 
state and form a linear crystal with a spacing of approximately 51m for 
n=5 ions. A computation is performed by first initializing all qubits 
to state |0) through optical pumping”’. This is followed by quantum 
gates, implemented by a series of coherent rotations using stimulated 
Raman transitions driven by a 355 nm mode-locked laser, where the 
beat-note between two counterpropagating Raman beams drives qubit 
and motional transitions”. To achieve individual addressing, we split 
one of the Raman beams into a static array of beams, each of which is 
directed through an individual channel of a multi-channel acousto- 
optic modulator (AOM) and focused onto its respective ion, as shown 
in Fig. 1b. Finally, the qubit register is measured with high fidelity (see 
Methods) by driving the 781. + ?P 1/2 cycling transition near 369 nm 
and simultaneously collecting the resulting state-dependent fluores- 
cence from each ion using high-resolution optics and a multi-channel 
photo-multiplier tube (PMT). 

The lowest level of qubit control consists of native single- and two- 
qubit operations. We perform single-qubit rotations R4(@) by tuning the 
Raman beat-note to qubit resonance vo. Here, the rotation angle # and 
axis ¢ are determined by the duration and phase offset of the beat-note, 
which is programmed through radio-frequency signals on appropri- 
ate AOM channels. Two-qubit XX-gates are performed by invoking 
an effective spin-spin Ising interaction between qubits mediated by 
the collective modes of motion of the chain’*®. Here, we apply Raman 
beat-notes tuned close to vp + v, that coherently couple the spins to all 
X-modes of motion. A pulse shaping technique*® disentangles the 
motion at the end of the gate, resulting in a two-qubit entangling rota- 
tion of any amount XX(y;,;). Here, the geometric phase \;,; originates 
from the integrated Ising interaction'*”®, the sign a= +1 of which 
arises from the Coulomb interaction between qubits i and j (Fig. 1b 
inset). We pre-calculate and optimize XX-gate pulse shapes off-line 
for all {i, j} to achieve high fidelity while keeping the gates relatively 
fast (see Methods). 

We use these native R- and XX-gates to construct standard logic 
gates, which can be called by a quantum algorithm. For instance, we 
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Figure 1 | Computation architecture. a, Hierarchy of operations from 
software to hardware. See main text for details. b, Hardware setup. A linear 
chain of trapped ion qubits along the Z axis is shown at the centre of the 
panel (‘Ion chain’). An imaging objective (“Detection optics’) collects ion 
fluorescence along the Y axis and maps each ion onto a multichannel 
photo-multiplier tube (PMT) for measurement of individual qubits. 
Counterpropagating Raman beams (‘Global and ‘Individual’) along the 

X axis perform qubit operations. A diffractive beam splitter creates an 
array of static Raman beams that are individually switched using a 
multi-channel acousto-optic modulator (AOM) driven by radio frequency 
(‘Control radio-frequency signals’) to perform qubit-selective gates. By 
modulating appropriate addressing beams, any single-qubit rotation or 
two-qubit Ising (XX) gate can be realized. For the two-qubit gates between 
qubits i and j, we can continuously tune the nonlinear gate angle yj. This 
represents a system of qubits with fully connected and reconfigurable 
spin-spin Ising interactions (inset). 


Control radio 
frequency signals 


Beam splitter 


implement the single-qubit Hadamard gate as H=R,(—1)R,(1/2) and 
the Z-rotation as R09) = R,(—1/2)R,(0)R,(1/2). Two-qubit logic gates 
such as CP and CNOT are compiled to account for the signs of the CP 
rotation angle 3 and the Ising interaction a, making them independ- 
ent of {i, j} and therefore modular (Fig. 2). At the highest level we pro- 
gram arbitrary sequences of such logic gates as required to implement 
any quantum algorithm. 

We first implement the Deutsch-Jozsa algorithm!'', which deter- 
mines whether a given function (the ‘oracle’) is constant or balanced. 
A function that has an n-bit input and a 1-bit output (f: {0, 1, 2, ..., 
2" — 1} — {0, 1}) is balanced when exactly half of the inputs result in 
the output 0 and the other half in the output 1, while a constant func- 
tion assumes a single value irrespective of the input. In our setup we 
program 7 out of the 70 possible oracles of three-qubit balanced func- 
tions by using seven different sequences of CNOT gates between each 
of the three qubits in the control register x= {X,X2X3} and the function 
register X4 (Fig. 3a). We program the two constant functions by setting 
X, to either 0 or 1. Executing the algorithm starts with preparing the 


control register in the superposition state | x) = a Deal k), followed 
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Figure 2 | Two-qubit modular gates. a, Decomposition of the controlled- 
NOT (CNOT) gate. The geometric phase \; of the XX-gate is 7/4, and 
we define aij =sgn(xjj). b, Decomposition of the controlled-phase (CP) 
gate where = sgn(0) for the controlled phase 0. The geometric phase of 
the XX-gate is adjusted such that j= aj|6|/4. See main text for details. 


by the function evaluation oracle. A CNOT is then performed between 
the function register X4 and the ancilla qubit X; (initially set to 
ro (|0) —|1))). All qubits are then rotated and measured (except for the 


ancilla) as shown in Fig. 3a. Finally, a measurement of x (conditioned 
upon X4= 1, occurring with 50% probability) determines if the func- 
tion is constant or balanced. Measurement of the output x = {111} indi- 
cates a constant function, while any other value indicates a balanced 
function (see Methods). The average success probability is 0.967(2) for 
constant and 0.932(3) for balanced functions (Fig. 3b), where the 
number in parentheses is the statistical uncertainty (1 s.d.). 

The Bernstein-Vazirani algorithm is a variant of the Deutsch-Jozsa 
algorithm where the oracle function is an inner product of two n-bit 
strings: f:(x) =c - x. Here, the aim is to determine the vector c= 
{c1c2...C,} ina single trial'?. We program all 16 instances of the four-bit 
oracle that evaluate the function f,(x) © Xs. This is achieved by applying 
a particular pattern of CNOT gates, determined by c, between 
x = {X|X2X3X4} and X,;= = (|0)- \1\) (Fig. 3c). For example, if 
c={0101} then CNOT gates are applied between X», X5 and X4, Xs. We 
start with a superposition state |x) = a Salk) followed by the 
oracle. Finally, applying a global R,(1/2) rotation produces the output 
state €, which is the inverse of c. In the experiment, a single-shot meas- 
urement of the correct outcome ¢ is obtained with a probability of 
0.903(2) (Fig. 3d), averaged over all possible oracle states. 

Exponential speed-up of many quantum algorithms arises from the 
fact that parallel function evaluation is performed on a superposition 
of all classical input states of an n-bit string. These evaluation paths are 
then interfered using a quantum Fourier transform (QFT) to produce 
the desired solution’. One such example is the order-finding protocol 
in Shor’s quantum factorization algorithm!’. Another application is 
solving the eigenvalue problem A|¢) =e'*|), where the phase ¢ can 
be estimated to n-bit precision using an n-bit QFT"*. These algorithms 
have been implemented in experiments using a semi-classical version 
of the QFT that consists of single-qubit rotations based on classical 
feed-forward and qubit recycling which reduces the required register 
size*!0°, The coherent QFT, on the other hand, is reversible and can 
be concatenated within an algorithm sequence. 

Here, we construct a coherent QFT on five qubits using all 10 mod- 
ular CP gates and involving a total of 80 single- and two-qubit native 
gates. This circuit fully exploits the high connectivity of a trapped ion 
system and illustrates how it can be scaled to larger modules (Fig. 4a). 
We apply the QFT in a period-finding protocol where we first prepare 
an input superposition state 77! 4C x|k) such that the coefficients {C,} 
exhibit a periodic amplitude or phase modulation (see Methods), which 
is followed by the QFT operation. The modulation periodicity then 
appears in the output state populations (Fig. 4b). 

We further examine the performance of the QFT in a phase estima- 
tion protocol where the eigenvalue ¢ is estimated to 5-bit precision. 
In this case the input state is prepared in the form - @5_,(|0)+ 


e~?/"'611)), which exhibits a ¢-dependent phase modulation 
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Figure 3 | Quantum algorithms. a, The Deutsch-Jozsa algorithm circuit 
on 5 ions. The oracle is implemented through gates shown in the shaded 
regions of the circuit. For balanced function oracles we apply each of the 
seven possible CNOT combinations, indicated in light grey. For the 
constant functions, we prepare X4=0 or 1 as indicated in dark grey. 

b, Measured populations of the output state for various functions, 
conditioned upon measuring X4= 1. The two constant functions f=0 
and f= 1 are indicated in dark grey, and the seven balanced functions 
given by particular CNOT gate combinations are indicated in light grey. 
Measurement of the output {X|X2X3} = {111} =7 indicates a constant 
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Figure 4 | Quantum Fourier transform protocol. a, Experimental 
sequence for implementation and verification of the quantum Fourier 
transform (QFT). ‘State preparation consists of single qubit rotations that 
create a phase and amplitude modulation of the coefficients {C;} of the 
input state 2}! 9 C;,|k). The shaded grey region contains a sequence of 
modular gates for implementing the QFT, which is then followed by a 
measurement of the register. b, Quantum period finding. Input states are 
prepared using single-qubit rotations to modulate the 32 state amplitudes 
with periods 1, 3, 4, 8, 16, and 32 (see Methods). The squared statistical 
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function, while any other value (0-6) indicates a balanced function. 

c, The Bernstein-Vazirani algorithm circuit. The shaded region contains 
programmed CNOT gate combinations used to implement different oracle 
states c. d, Measured output population for various oracle states. The 
output is the inverted oracle state ¢. Data represented in b, d are obtained 
by sampling over 20,000 experimental repetitions for each function or the 
oracle state c and the errors for the success probabilities in each case are 
statistical uncertainties within 1 s.d. The displayed probabilities are colour 
coded (key at right). 
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overlap (SSO)? signifies the fidelity of the protocol where the error is a 
statistical estimate over 8,000 experimental repetitions. The grey and red 
bars represent populations calculated from theory and measured in the 
experiment, respectively. c, Quantum phase estimation using five 
measurement qubits. The plot shows populations in the output state that 
estimates the given phase modulation ¢ of the input state amplitudes {C,}. 
Probabilities in the output state population are colour coded. We observe 
the correct value of the phase in each case with a probability >0.6. The 
experiment is repeated 8,000 times for each value of @. 
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C= ae We apply the QFT on this state to estimate @ by mapping 
its value onto populations of the output state, as shown in Fig. 4c. This 
is repeated for several cases where ¢ is incremented in steps of 27/64 
over the range 0 to 27. Values of ¢ that are integer multiples of 27/32 
result in the output state |32¢/27). This is achieved with an average 
fidelity of 0.619(5). For non-integer values, the population is distributed 
between the nearest 5-bit approximate states!*. 

In our experiments, each algorithm fidelity is limited mainly by the 
native gate errors (<2%), which propagate into the standard logic gate 
errors (<5%) (see Methods). These errors are dominated by Raman 
beam imperfections and therefore can be reduced by mitigating 
Raman beam intensity noise”? and individual addressing crosstalk (see 
Methods). Systematic shifts in the axes of the gate rotations accumulate 
due to unequal Stark shifts across the qubits, which result in algorith- 
mic errors that depend upon the circuit structure. This type of error 
can be easily eliminated by feeding forward known shifts to the radio 
frequency of individual qubit control beams. 

The algorithms presented here illustrate the computational flex- 
ibility provided by the trapped-ion quantum architecture. Within a 
single module, this system can be scaled to dozens of qubits by lin- 
early increasing the number of radio-frequency controls and AOM- 
and PMT-channels at the hardware level. In software, the number of 
XX- and R-gate calibrations required to compile any logic gate scale 
as O(n’). As more ions are added to the chain, the axial confinement 
must be weakened to maintain a linear crystal. This will slow down 
the XX-gate duration roughly as n'”, but the crosstalk is not expected 
to get worse (see Methods). Finally, implementing this architecture 
on multi-zone ion traps such as surface traps will provide further 
control over the connectivity of qubits though shuttling’ for scalable 
computation. This will also enable selective measurement of qubits that 
can be fed-forward classically to perform conditional operations in the 
module? as required for fault-tolerant computing. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Experimental techniques. We use a linear radio-frequency Paul trap made of 
four segmented blade electrodes driven at 23.83 MHz where the transverse secular 
frequency of the trap is actively stabilized*!. For measurement, state-dependent 
fluorescence is collected by a 0.38 numerical aperture objective that images ions 
with 0.55 1m resolution. For a single qubit, single-shot detection fidelities for states 
|0) and |1) are 99.74(3)% and 99.09(5)%, respectively. For n=5 qubits, detection is 
degraded by signal crosstalk between PMT channels, and the average single-shot 
fidelity is 95.3(2)% for the 2” states. For the population distributions measured in 
Figs 3 and 4 and the reported algorithm fidelities, multi-qubit detection is per- 
formed by signal-averaging the populations of all 2” states over a few thousand 
experimental repetitions. In this way, detection and crosstalk errors are removed 
by decomposing the measurements into the known detector array response of all 
32 possible qubit states. The individual addressing Raman beams are modulated 
using a multi-channel AOM (Model H-601 Series 32-Channel UV Acousto-Optic 
Modulator, PN: 66948-226460-G01, Harris Corporation) and focused down to 
a beam waist of approximately 1.5}.m at the ions. Addressing crosstalk between 
neighbouring ions due to Raman beam spillover is <4%, which can be improved 
using higher resolution optics*. 

As more ions are added to a chain, the ratio of axial-to-transverse confinement 

must be weakened to maintain a linear crystal (v,/v, < 0.6n~°*®)*3, For constant 
transverse confinement, this means that the minimum ion spacing remains the 
same. However, this will slow the gates down. In our setup (for n= 5) two-qubit 
XX-gates for any ion pair {i, j} have a duration of 7, = 235 Sus, which depends on 
the spectral splitting of the transverse modes (T~ v/v? ~ n'7). The XX-gate 
pulse shape is a 9-segment piecewise-constant Rabi frequency modulation {Qi}, 
(where 1 <k <9), which is implemented by modulating the global Raman beam. 
Optimized pulse shapes are calculated for each ion pair such that {2} is within 
practical limits and the gate fidelity is maximized. The number of classical calcu- 
lations to find the pulse shapes scales as O(n”). The XX-gates are calibrated by 
setting the product of the laser intensities on the two qubits such that »;,;= 1/4 
(refs 26, 34-36). For CP gates that require other values of yj, we scale the laser 
intensity accordingly. Single-qubit rotations are calibrated by measuring the Rabi 
frequency §2; of individual qubits. Single-qubit native R-gates have a duration of 
approximately 0.17. 
Implementation of the Deutsch-Jozsa algorithm. The Deutsch—Jozsa algo- 
rithm is implemented by starting with an equal superposition of all classical input 
states to the function f(x): {0, 1, ..., 7} {0, 1}. We prepare this by initializing all 
qubits to |0), followed by R,(1/2) rotations on the clad in the control register 
x= X,XX3. Then we rotate the ancilla qubit Xs using R,(—71/2). The resulting 
5-qubit state is 


|0)s —|1)s 
|w)o = oh j23 @ [0)s @ = 


where x is the decimal representation of qubits X,X2X3. Then we apply the function 
on the input superposition state such that the value is written to X4. The resulting 
state is 


@ Oss 
oh yi23 F(x) a al 


Wy = 
This is followed by a CNOT between the function register X, and the ancilla X5 
which provides a phase ‘kick-back’ to produce the state 


Et 1x) \123 [f(x ))4 @ 2 


This is followed by a single-qubit rotation R,(1/2) on all qubits. Then we measure 
the first four qubits to reach the solution and ignore the ancilla qubit since it is not 
entangled with the other qubits. The state of qubits X:X2X3X, before measurement 
can be written as 


=1y 3 PU )ian 110) 
fa 


8 y= 0 x=0 (1) 
= Co000|0000) + Cooo1|0001) + 


ee Ci110|1110) + Ciiii|1111) 
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where 7 is the bit-wise inversion of y. If f(x) =a is a constant function (with 
a= {O0, 1}), the coefficients of the basis states |1110) and |1111) are 
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If f(x) is a balanced function, then the coefficients are 
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Here we use the property that f(x) =0 for exactly half of the values of x and 1 
for the rest. Conditioned upon X,=1, there is unit probability of measuring 
X,, X2, X3=111 for a constant function and 0 probability of measuring the same 
outcome when the function is balanced. In equation (1), note that the probability 
of measuring X4= 1 is 0.5 irrespective of the number of qubits in the input (control) 
register of the function. 

Native single- and two-qubit rotations. Native single-qubit operations Rg(6) are 
rotations of the Bloch vector by an angle 6 about an axis on the equator of the 
Bloch sphere, where ¢ is the angle between this rotation axis and the X axis. The 


single-qubit operator is 
cos] ~isin Je 
2 2 


sn? 2) 
—1sin| — }e cos} — 
2 2 


The standard X and Y rotations used in the composite gates are simply R,(@) = Ro(9) 
and R,(9) = R,/2(9). 

Native two-qubit XX-gates are performed by invoking a 0,0,-Ising interaction 
between qubits i and j, which is mediated though the coupling of the qubits to 
the collective transverse motional modes of the ion chain. The resulting two- 
qubit entangling rotation XX(xjj) depends on the geometric phase yj, which 
is the integrated Ising interaction and can be varied by changing the Raman 
beam intensity. The sign of the geometric phase a= sgn(yj) depends on how 
ions i and j couple to the common transverse motional modes. The XX-gate 
operator is 


Rg(0) = 


cos(x;) 0 0 —isin(x,) 
0 cos(x;)) —ésin(x;)) 0 
XX(x;j) = 23 
0 —isin(y;,) cos(xj) 0 
—isin(x;) 0 0 cos(;) 
In this experiment, 042, A145, A114, A125, 0135, 23, A34= +1 and a45, A25,043=—1. 


Composite gate fidelity. Controlled- NOT (CNOT) gates are performed between 
all ion pairs and characterized in the following way. We perform the CNOT gate 
on all four classical input states |00), |01), |10), |11) and measure the fidelity from 
the population of the desired output state. The average fidelity of a CNOT on each 
ion pair is shown in Extended Data Table 1. 

Controlled-phase (CP) gates are performed between all ion pairs and charac- 
terized by using a sequence of gates. We first initialize the qubits in the state 


|1)(|0) + |1)), where the first qubit is the control qubit and the second qubit is 
the target qubit. This is followed by a conditional phase gate CP(6) that creates the 
state mea 1)( |0) + el? | 1)). A final rotation R,(1/2) on the target qubit projects the 


seal phase @ onto the population of the target qubit as P(|1)) =(1 — sin@)/2. 
This is shown in Extended Data Figure 1. 

We measure the fidelity of the CP gates at conditional phases = +71/2, which 
correspond to the maximum and minimum values of 6, respectively, which are 
used in a coherent QFT or QET™!. At these values of @, where the geometric phase 
xij = 7/4, the XX-gates are most sensitive to laser intensity fluctuations, which 
leads to maximum errors. This is evident from the data shown in Extended Data 
Figure 1, where a maximum deviation of the target qubit from the ideal output state 
occurs at +7/2. Therefore, the fidelity measure at these values is a lower bound 
on the CP gate fidelity. The fidelity is obtained by measuring the populations in 
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the |10) and |11) states for 9=+7/2 and 6=—7/2, respectively. Extended Data 
Table 2 shows the fidelities of all CP gates. 

QFT state preparation. For the period-finding experiment, an amplitude or phase 
modulation is created in the coefficients C; of the input state DkeoCuk) using 
individual single-qubit rotations. Extended Data Table 3 shows the input states for 
various measured periodicities. 

Sample size. No statistical methods were used to predetermine sample size. 
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Extended Data Figure 1 | Controlled-phase gate. Shown is the indicate the theoretical probability of measuring the target qubit in |1) 
performance of the controlled-phase (CP) gate between control (red) and whereas the data points show experimental data. Error bars are statistical, 
target (blue) qubit for different qubit-pairs. The control qubit is prepared indicating a 95% confidence interval for 2,000 experimental repetitions. 


in the state |1) which remains unchanged during the gate. Solid blue lines 
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Extended Data Table 1 | Controlled-NOT gate fidelities 


lon pair Fidelity (%) | lonpair Fidelity (%) 


95.6(6) 97.2(5) 


Gate fidelity is obtained by performing CNOT gates on all possible pairs of ions (‘ion pair’) in a chain of five qubits. 
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Extended Data Table 2 | Controlled-phase gate fidelities 


lonpair  @=, fidelity(%)  @= —&, fidelity (%) 


91.1(6) 
93.6(5) 
91.6(6) 
95.9(4) 
90.7(6) 
94.2(5) 
95.8(4) 
91.0(6) 
96.0(4) 
93.5(6) 


Gate fidelity is obtained by performing CP gates on all possible pairs of ions (‘ion pair’) in a chain of five qubits for conditional phases 0 = +7/2. 
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Extended Data Table 3 | Input states in QFT-period finding 


Input state 
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See Methods for details. 
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Carbon -based tribofilms from lubricating oils 


AliErdemir!, Giovanni Ramirez!, Osman L. Eryilmaz!, Badri Narayanan’, Yifeng Liao'+, Ganesh Kamath? & 


Subramanian K. R. S. Sankaranarayanan? 


Moving mechanical interfaces are commonly lubricated and 
separated by a combination of fluid films and solid ‘tribofilms’, 
which together ensure easy slippage and long wear life!. The 
efficacy of the fluid film is governed by the viscosity of the base 
oil in the lubricant; the efficacy of the solid tribofilm, which is 
produced as a result of sliding contact between moving parts, 
relies upon the effectiveness of the lubricant’s anti-wear additive 
(typically zinc dialkyldithiophosphate)”. Minimizing friction and 
wear continues to be a challenge, and recent efforts have focused on 
enhancing the anti-friction and anti-wear properties of lubricants 
by incorporating inorganic nanoparticles and ionic liquids**. Here, 
we describe the in operando formation of carbon-based tribofilms 
via dissociative extraction from base-oil molecules on catalytically 
active, sliding nanometre-scale crystalline surfaces, enabling base 
oils to provide not only the fluid but also the solid tribofilm. We 
study nanocrystalline catalytic coatings composed of nitrides of 
either molybdenum or vanadium, containing either copper or 
nickel catalysts, respectively. Structurally, the resulting tribofilms 
are similar to diamond-like carbon’. Ball-on-disk tests at contact 
pressures of 1.3 gigapascals reveal that these tribofilms nearly 
eliminate wear, and provide lower friction than tribofilms formed 
with zinc dialkyldithiophosphate. Reactive and ab initio molecular- 
dynamics simulations show that the catalytic action of the coatings 
facilitates dehydrogenation of linear olefins in the lubricating 
oil and random scission of their carbon-carbon backbones; the 
products recombine to nucleate and grow a compact, amorphous 
lubricating tribofilm. 

Transportation vehicles account for about 19% of the world’s energy 
consumption and some 23% of total greenhouse-gas emissions every 
year®. With ever-increasing mobility, these numbers will undoubtedly 
surge and present challenges for sustainable transportation”®. So far, 
new efficiency and emission standards imposed on vehicles have been 
the main driving force behind the development of cleaner and more 
fuel-efficient lubricants over the years®. Most efforts have involved 
lowering the viscosity of base oils, and exploring new ways to replace 
zinc dialkyldithiophosphate (ZDDP) and other additives that contain 
sulfated ash, phosphorous and sulphur (SAPS) with more environ- 
mentally friendly alternatives, including inorganic nanoparticles, ionic 
liquids and coatings**!°"". It is desirable to further reduce the use of 
environmentally harmful additives that give rise to adverse emissions, 
without compromising on performance in terms of protection against 
friction and wear. 

Here, we report the design and synthesis of a new breed of cata- 
lytically active nanocomposite coatings. These coatings are made of 
the nitrides of transition metals such as molybdenum and vanadium 
(90-95% by atomic weight), together with metal catalysts such as 
copper (5-10% by atomic weight)—producing, for example, the 
MoN,-—Cu coating described below and the VN-Cu coating described 
in the Supplementary Information. Figure 1a shows a cross-sectional 
transmission electron microscopy (TEM) image of the MoN,—Cu 
coating, which was deposited on martensitic chrome steel substrates 


(specifically, AISI 52100 steel), which have a thickness of about 600 nm. 
The high-resolution transmission electron microscopy (HRTEM) 
image in Fig. 1b reveals a very dense and nanocrystalline structure, 
composed of copper and MoN, grains of 5-10 nm in diameter. Figure 1c 
shows composite Cu-K and Mo-K edge energy-dispersive X-ray 
spectroscopy (EDS) images in cross-section, which confirm the presence 
of copper-rich clusters throughout the film (Extended Data Figs 1 
and 2 show detailed HRTEM and X-ray diffraction (XRD) analyses 
of the nanocomposite coatings). X-ray photoelectron spectroscopy 
(XPS) revealed peaks belonging to the key constituents of the coat- 
ing (molybdenum, copper and nitrogen) and to some surface 
contaminants (mostly oxides; Extended Data Fig. 3). We estimated the 
amount of copper in the film to be about 5% by XPS. Nanomechanical 
characterization of the MoN,—Cu coating revealed hardness values of 
about 20 GPa and an elastic modulus of around 235 GPa (Extended 
Data Fig. 4). 

We determined the friction and wear behaviour of the nano- 
composite with a ball-on-disk test pair, in which a stationary ball 
is pressed against a rotating disk and both are coated in a lubricant 


eae ecu! 


Figure 1 | Structure of the MoN,—Cu nanocomposite coating. a, General 
cross-sectional TEM image of a MoN,-Cu sample prepared with a focused 
ion beam, confirming a dense, compact coating of approximately 600 nm 
thickness. The thin, dark-looking layer above the steel is the molybdenum 
bonding layer between the coating and the substrate (60 nm thick). The 
layer above the coating is the protective platinum film, needed for focused 
ion-beam milling. b, High-resolution TEM image taken from a region near 
the top surface, showing the highly nanocrystalline structure of the MoN,- 
Cu film with crystal size of less than 10 nm. The inset shows the diffraction 
pattern of the polycrystalline structure. 5-MoN, is hexagonal and )-MoN, 
is cubic (see Extended Data Figs 1 and 2) c, Cu-K and Mo-K edge EDS 
elemental mapping of the film, confirming the existence of copper clusters 
(lighter regions in Mo-K and darker regions in Cu-K images) uniformly 
distributed within the film structure (EDS sampling depth is estimated 

to be between 1 and 21m). The ‘grey’ image is a STEM image. 


lEnergy Systems Division, 9700 South Cass Avenue, Argonne National Laboratory, Argonne, Illinois 60439, USA. @Center for Nanoscale Materials, 9700 South Cass Avenue, Argonne National 
Laboratory, Argonne, Illinois 60439, USA. +Present address: Dow Corning Corporation, 2200 West Salzburg Road, Midland, Minnesota 48642, USA. 
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Figure 2 | Friction and wear behaviour of the MoN,-Cu-coated steel 
ball and disk in PAO 10 oil, and comparison with uncoated steel in PAO 
10 or fully formulated 5W30 oils. a, Comparison of friction coefficients. 
b, Top, optical micrographs obtained from rubbing ball surfaces, showing 
the condition and difference in wear scar size for the three tests (see also 
Fig. 2a). Below, line scans and three-dimensional profiles reveal the extent 
of wear damage more clearly. There is unmeasurable wear for the ball 
coated with MoN,—Cu, while the steel ball tested in PAO 10 oil shows the 
highest wear loss, revealing a very flat and polished wear scar after the 
10-hour test (here, contact pressure was reduced to 70 MPa owing to the 
formation of a flat wear scar). The steel ball tested in formulated 5W30 

oil shows a much smaller wear scar than the steel ball tested in PAO 10 
(contact pressure was reduced to 282 MPa for the test with 5W30 oil). 


(see Supplementary Methods). Figure 2a compares the friction 
coefficient of a MoN,—Cu-coated test pair with those of uncoated 
52100-steel test pairs in pure poly-alpha-olefin (PAO) 10 and fully 
formulated 5W30 oil (which contains ZDDP and a suite of other 
additives). Despite the severe contact pressure (1.3 GPa) between 
ball and disk, the friction coefficient of MoN,—Cu is about 0.08 and 
very steady throughout the test. The amount of wear is limited to a 
few scratches; otherwise, the highly spherical nature of the original 
ball surface is well preserved (Fig. 2b). Closer inspection reveals the 
formation of a tribofilm during the sliding process. This film accumu- 
lates around the trailing edge of the contact spot as a black debris layer. 
Right on the top of the contact spot, where sliding has occurred, the 
tribofilm appears much thinner and is difficult to discern by optical 
microscopy, but we were able to confirm its presence by time-of-flight 
secondary-ion mass spectrometry (TOF-SIMS) (Extended Data Fig. 5). 

The friction coefficient of the uncoated 52100 test pair in PAO oil 
is about 0.09 initially, but after about 200 minutes it becomes very 
unsteady, fluctuating between 0.02 and 0.12. This is most likely due to 
the generation and accumulation of many wear-related debris particles 
at the sliding-contact interface, giving rise to high frictional instability 
and severe wear losses (Fig. 2b). In fact, subsequent calculations of 
wear-induced loss for the steel ball indicate a total volumetric wear of 
1.34 x 10° m? (1.05 x 107° g) (Extended Data Fig. 6). The friction 
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Figure 3 | Raman spectra of the tribofilms produced during tests of 
coated and uncoated steel balls in pure PAO 10. a, UV Raman (325 nm) 
spectra of: the black debris patch or the deposits found in and around the 
edge of the wear scar of a ball coated with MoN,—Cu; the solid, diamond- 
like carbon (DLC) film produced by magnetron sputtering; and pyrolytic 
graphite (used as a reference). The D and G bands of the carbon deposits 
are clearly observed in positions very close to those of the graphite 
reference; the predominant intensity of the G peak and the width of the 
peaks show in a qualitative way that the carbon deposit (or tribofilm) is 
structurally amorphous (as is DLC) with a high fraction of sp” bonding, 
suggesting that the carbon deposit produced during rubbing is graphitic. 
b, Visible Raman spectra of the brown patch found around the wear scar 
of uncoated 52100 steel. These spectra are analogous to those obtained 
from Fe,O3 powder (used as a reference), with multiple Raman bands that 
correspond to one another. 


coefficient of the steel test pair in a fully formulated 5W30 oil is around 
0.1 (Fig. 2a); the ball wear (1.26 x 107 !3 m3; 9.86 x 10~’g) is much 
lower than that of the steel ball tested in PAO but higher than that of 
the MoN,—Cu coated ball in PAO (Extended Data Fig. 6). The extent 
of wear damage on the flat samples is commensurate with the damage 
found on the ball surfaces (Extended Data Fig. 7). 

Elucidating the chemical nature of the debris particles found in and 
around the wear scars is important for understanding the mechanisms 
that are responsible for the tribological events taking place during 
sliding. Accordingly, we have analysed the chemistry of the debris 
particles using confocal Raman spectroscopy and TEM. Figure 3a is a 
composite of three Raman spectra (obtained with a 325-nm laser): one 
from the black debris layer seen in Fig. 2b, one from a diamond-like 
carbon (DLC) coating (produced by magnetron sputtering in argon/ 
methane plasma), and one from pyrolytic graphite (as a reference). 
The spectrum from the black debris layer on the MoN,—Cu coated 
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Figure 4 | TEM sample preparation and results. a, SEM image of the 

area around the wear scar on a steel ball coated with MoN,—Cu; carbon-rich 
debris was collected from the area indicated by the red arrow. 

b, Collection of the carbon-rich debris (the tribofilm), using a tungsten tip in 
the focused ion beam with the help of a micromanipulator. c, High-resolution 
TEM near the edge of the scooped debris. The presence of a large amorphous 
area, with some nanocrystals and onion-like carbon, is revealed; the electron 
diffraction (inset) confirms the amorphous nature of the tribofilm. d, EELS 
spectrum (deconvoluted with the zero loss) of the tribofilm, in comparison 
with the spectrum for a highly oriented pyrolytic graphite (reference), 
showing that the film is made of mainly graphitic carbon. 


ball is similar to that from the DLC film, as it shows one-to-one 
correspondence of the D (at around 1,395cm_') and G (1,600- 
1,610cm7') bands. The spectrum from pyrolytic graphite indicates 
that the black debris around the wear scar of the MoN,-Cu coated ball 
is made mostly of sp*-bonded carbon (as implicit from the broadness 
and high intensity of the G band). We attribute the broad feature at 
around 2,900cm'! to CH, stretching in both the DLC and the debris 
layer'?. The shift of the G band above 1,600 cm! (in comparison with 
graphite, at 1,583 cm7') suggests a high degree of disorder in the debris 
layer and DLC, as well as the presence of some sp’ chains”. 

As for the brownish debris around the wear scar of the 52100 ball 
tested in PAO oil, the Raman analyses (Fig. 3b) confirm that this was 
made mostly of iron oxides (several peaks from the debris match those 
from the Fe,O3 reference). Obviously, during rubbing, the dominant 
wear mechanism was oxidative wear!?, and the oxide-based debris was 
thus controlling the wear and friction of this test pair. 

In Fig. 4, we present our findings from TEM of the black debris layers 
shown in Fig. 2b. Briefly, we used a tungsten tip to transfer part of 
the debris from the top right section (denoted with a red arrow) to 
a copper post of an OmniProbe lift-out grid (Fig. 4a, b). The edge of 
the debris had thin regions that were electron transparent. Figure 4c 
shows an HRTEM image, which reveals that the debris is amorphous, 
but that some nanocrystals are also present. These crystals are about 
5 nm in size and from the MoN; (on the basis of the lattice fringe/d 
spacing; Extended Data Fig. 8). The EDS of the debris layer reinforces 
this assertion, revealing strong peaks for molybdenum (Extended Data 
Fig. 9). (Because of the background from the TEM column and the 
copper grid, the amount of copper is not quantifiable from the EDS.) 
The presence of such crystalline MoN, particles within the debris is not 
surprising, as the MoN,—Cu coating experienced a certain degree of 
abrasion (see the scratches in Fig. 2b). During sliding, these crystalline 
MoN, particles became an integral part of the debris layer, accumulating 
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around the edges of the wear scars in Fig. 2b. The electron-diffraction 
pattern in Fig. 4c and the electron energy-loss spectra (EELS) in Fig. 4d 
show that most of the debris seen in Figs 2b and 4a is composed of 
amorphous carbon that has high sp’ character, as manifested by a strong 
«* peak at 285 electronvolts (eV). On the basis of the x* and o* peaks, 
we calculated the fraction of sp?-bonded carbon atoms in the debris to 
be about 82%, consistent with the Raman spectra (Fig. 3a). The Raman 
spectra also showed dominant G-peak positions corresponding to those 
of graphite; the broadness of these peaks (compared with those from 
graphite) is due to the smaller cluster size, the cluster distribution, and the 
influence of stress in the debris'”. Onion-like carbon" was also observed 
by HRTEM (Fig. 4c), suggesting the transformation of oil molecules to 
other carbon allotropes at the sliding-contact interface of MoN,—Cu. 

Our results provide clear evidence that the MoN,—Cu coating can 
transform PAO molecules (hydrogenated alkenes) into carbon-based 
tribofilms at the sliding interface. These films are similar to films of 
hydrogenated amorphous carbon with high sp” bonding. Our tribofilms 
were able to nearly eliminate wear and to provide lower friction than 
those resulting from 5W30 oil, despite the severe contact pressure 
(Fig. 2b). Mechanistically it is well known that, under lubricated 
sliding conditions, a protective tribofilm is derived from the ZDDP 
in formulated oils*!° mainly because mechanical rubbing under 
high contact pressure and shear stress generates heat!*!®, plastic 
deformation!’, and structural defects!8’—all of which increase the 
chemical activity of the rubbing interface. We expect that these events 
occur more favourably with PAO- or mineral-oil-lubricated MoN,-Cu 
or VN-Cu coatings than with uncoated surfaces, because the activation 
energy for dissociative extraction of carbon-based tribofilms from the 
long-chain hydrocarbon molecules of lubricating oils is lowered. 

To understand the atomistic mechanism of tribofilm formation, we 
carried out extensive ab initio molecular dynamics (AIMD) and reactive 
molecular-dynamics (RMD) simulations, starting from an initial 
configuration composed of linear olefin chains sandwiched between 
the sliding tribological interfaces (see Supplementary Methods). Both 
simulations were performed at 1,000 K, which is representative of the 
asperity-level flash temperatures of typical tribology experiments!*"'°. 
Our AIMD simulations illustrate the elementary steps involved in 
the catalytic dissociation of olefin (Fig. 5a—c). A similar pathway is 
predicted by our RMD simulations, which also capture the nucleation 
and growth of the dissociated olefin into a carbon tribofilm 
(Fig. 5d-f). In both simulations we observe that, on non-carbide- 
forming surfaces such as copper, the olefins catalytically convert to 
tribofilms that are reminiscent of hydrogenated amorphous carbon, 
following the atomistic pathway shown in Fig. 5a-c. Initially, the olefin 
molecules are uniformly distributed (Fig. 5d). Then, under the sliding 
action at the Cu-olefin interface, the olefins degrade via two competing 
steps: first, dissociation of C-H bonds near the Cu surface, leading 
to dehydrogenated chains (equation (1) and Fig. 5b, e); and second, 
random scission of backbone C-C bonds to form shorter hydrocarbon 
fragments (equation (2) and Fig. 5c, f). 


CimHom & CmHom_x +xH (1) 


CrHan & Cn—mHa(n—m) + Cm am (2) 


Note that it is easier to break a C-C bond than a C-H bond owing 
to its lower bond-dissociation energy (C-C: ~347 kJ mol"!; C-H: 
~414kJ mol}; refs 19, 20). However, in the presence of metal catalysts 
such as nickel and copper, dehydrogenation is facilitated”!~** via a three- 
centred transition state, in which a C-H bond bridges across the metal 
until the bond breaks, after which metal-H and metal-C bonds form 
(see Supplementary Methods for details of charge-transfer dynamics, 
chemical-bond evolution and the electronic origin of catalytic activity 
on different metal surfaces). The released free hydrogen either gets 
adsorbed onto the metal surface and diffuses into its bulk, or recombines 
at the sliding interface to form H, molecules. Meanwhile, the dissociated 
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Initial configuration Dehydrogenation 
d e f 


Backbone scission 


Figure 5 | Atomistic mechanism of tribofilm 
formation by MoN,-Cu, deduced from ab initio 
and reactive molecular-dynamics simulations. 
a-c, AIMD simulations illustrate the catalytic 
action of copper, dehydrogenating and breaking 
linear olefins into shorter-chain hydrocarbons (see 
Supplementary Video 5). d-f, A similar reaction 
pathway is predicted by our reactive molecular- 
dynamics simulations (see Supplementary 

Video 1). The end result is that dehydrogenated 
short-chain hydrocarbons recombine to nucleate, 
and grow into, a compact amorphous carbon 
tribofilm. Simulations suggest that this tribofilm 
mechanism is suppressed on surfaces where 
carbide formation is thermodynamically favoured 
(see Supplementary Video 2 and Supplementary 
Information for the evolution of various chemical 
bonds). g, h, Snapshots from reactive molecular- 
dynamics simulations show that two carbide- 
forming surfaces (vanadium and molybdenum) 
do not, to all intents and purposes, form 
tribofilms. Although a sliding molybdenum 
surface does (like copper) show olefin 
degradation, the catalytic activity at 1,000 K is 
much lower and the resulting kinetics of tribofilm 
formation is very sluggish (see Supplementary 
Videos 3, 4, 6 and 7). Detailed first-principles 
calculations show that MoN and VN have much 
reduced catalytic activity compared with copper 
and nickel (see Supplementary Video 8 and 
Supplementary Information for the electronic 
origin of differences in the catalytic behaviour of 
various metals and nitrides). 


AIMD 


Classical (reactive) molecular dynamics 


short-chain dehydrogenated hydrocarbons recombine to form graphitic 
tribofilm (Fig. 5d-f; see Supplementary Information for details of the 
temporal evolution of C-H, C-C, metal-C, metal-H and H-H bonds). 
An analysis of the final configuration shows that about 80% of the car- 
bon atoms are sp?-hybridized, consistent with our experiments. 

It has been shown that graphitic tribofilms can even form in vivo, on 
the rubbing surfaces of metal-on-metal hip replacements (which are 
made of cobalt, chrome and molybdenum)™. Proteins are suspected 
to be the main carbon source in these tribofilms, which result from the 
catalytic nature of the cobalt and molybdenum in the metal-on-metal 
alloy. In our case, the extraction of tribofilm from PAO molecules results 
from the catalytic nature of the composite MoN,-—Cu coating (while 
metal nitrides without a catalyst exhibit much reduced catalytic activity, 
as suggested by our simulations; see Supplementary Information). 
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In summary, we have shown that catalytically active coatings enable 
base oils to provide solid tribofilms that protect sliding surfaces against 
friction and wear. Our concept could be extended to the redesign of 
bulk materials to incorporate catalytic metals, leading to the formation 
of similar carbon-based tribofilms. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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using the DiffTools suite in DigitalMicrograph software”. a, Diffraction 
pattern, showing the presence of two phases of MoN, that have been 
reported in the Inorganic Crystal Structure Database (ICSD; http://icsd. 
fiz-karlsruhe.de/): hexagonal 5-MoN (ICSD accession number 99452) and 
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Extended Data Figure 2 | X-ray diffraction patterns of the MoN,-Cu 
coating. These spectra show the presence of copper, the molybdenum 
bonding layer and three different phases of MoNx. The possible MoN 
phases that can be produced by physical vapour deposition have been 
well studied”, but identifying such phases by XRD is difficult because 
the diffraction peaks can overlap. Here, grazing incidence XRD could 
not recognize the molybdenum bonding layer (between substrate and 
coating) that is identified with the peak at 40.5° in the Bragg-Brentano 
measurement, and which corresponds to the (110) plane of molybdenum 
(ICSD accession number 52267). The presence of the hexagonal $-MoN, 
cubic \-Mo2N and cubic &/-MoN in the MoN,—Cu nanocomposite coating 
was confirmed using the relevant peaks. The lattice parameters were 
calculated on the basis of the d spacing of the peaks. The peaks located at 
35.7°, 48.24° and 64.14° correspond to the (200), (202) and (220) planes, 


respectively, of 5-MoN (ICSD accession number 99452); lattice parameters 
are a= 5.8035 A and c=5.7006 A. The peak at 42.67° is well associated 
with the (200) plane of y-Mo2N (ICSD accession number 158843); this 
peak is generally the one used to identify the presence of this phase*®; 

the lattice parameter is a= 4.2345 A. The cubic §/-MoN (ICSD accession 
number 159439) was found via the low intensity peak at 60.11°; the lattice 
parameter is a= 4.35 A. Copper is more evident on the grazing incidence 
XRD pattern (owing to the greater surface sensitivity compared with the 
Bragg-—Brentano method and the possibility of observing planes that are 
different to those that are perpendicular to the surface). The peaks located 
at 43.68°, 50.88° and 74.82° are associated exclusively with the (111), (200) 
and (220) planes, respectively, of copper (ICSD accession number 64699) 
(a=3.5864 A). 
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Extended Data Figure 3 | X-ray photoelectron spectroscopy of the 


MoN,-Cu coating. a, Survey spectrum showing the presence of copper, 
molybdenum, nitrogen and oxygen in the film. The composition was 
calculated on the basis of the high-resolution peaks of the molybdenum 
3d (doublet), oxygen 1s, nitrogen 1s and copper 2p3/2 orbitals. Intensity is 


fo” 


1,300 
1,200 
1,100 


1,000 


Intensity (cps) 


x T T T T T T T T 
404 402 400 398 396 394 392 390 388 
Binding Energy (eV) 


= 
w 
= 
Oo 


Intensity (cps 
= 
8 


er 
= 
=) 
So 


T . T T T T T g| 
938 936 934 932 930 928 926 
Binding Energy (eV) 
in counts per second. b, High-resolution spectra showing the 
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c, High-resolution spectrum of the molybdenum 3d orbital. d, High- 
resolution spectrum showing the copper 2p3/2 orbital. 
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Extended Data Figure 4 | Nanoindentation analyses of the MoN,-Cu mechanical properties. The Oliver-Pharr method” was used to calculate 
coating. a, Loading/unloading curves for the measurements, using loads the hardness and the elastic modulus of the coating. b, Hardness as a 
of between 500 \N and 12,000 \\N. The ‘depth’ on the x-axis refers to the function of penetration. The hardness did not decrease with penetration, 
penetration depth of the diamond indentation tip. Different loads were and was 19.6 + 1.4GPa. c, The elastic modulus of the coating was 
used to evaluate the hardness (H) and elastic modulus (Er) as a function 234.7 + 11.4 GPa across a range of indentations. 


of the contact depth (h,), in order to avoid any influence of the substrate’s 
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Extended Data Figure 5 | TOF-SIMS spectra and mapping of the 
contact region. See Supplementary Methods for experimental details. 
a, Spectrum of material from inside the contact spot (denoted with a 
circle). b, Spectrum of material from outside the contact spot (denoted 
with a circle). c, Spectrum after subtraction of the spectra in a and b, 
verifying the presence of carbon and other hydrocarbon fractions on 


t+ t—_{ 


the contact spot (subtraction has been done after normalization of each 
spectrum according to total ion counts). d, Two-dimensional TOF-SIMS 
images from the contact spot and surrounding regions, confirming the 
presence of carbon and hydrocarbon fractions in the contact spot. The 
strong carbonaceous signals in the far outside section of the perimeter are 
due to the carbon-rich debris layers shown in Figs 2b and 4a. 
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MoN,-Cu in PAO 10 Steel in PAO 10 Steel in 5W-30 oil 
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Extended Data Figure 6 | Wear volumes, calculated using three- b, Wear volume calculations, based on the profilometry in a, and 
dimensional optical profilometry. a, Three-dimensional representation compared with geometrical calculations based on the wear scar diameter. 
of the volume loss of three steel balls (one coated with MoN,—Cu and The wear volumes calculated using the two techniques are comparable, 
two uncoated, in different oils) on the basis of flattening of the except in the case of MoN,—Cu, where the diameter represents only the 
spherical caps of the steel balls. There is very little wear, with only a few polished Hertzian contact area. 7.35E—15 is 7.35 x 101°, and so on. 


scratches, on the ball coated with MoN,—Cu and tested in PAO 10. 
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MoN,-Cu in PAO 10 Steel in PAO 10 Steel in SW-30 oil 
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Extended Data Figure 7 | Wear produced on the flat surface of the disk wear damage more clearly. There is unmeasurable wear for the MoN,-Cu- 


coated flat surface. The steel surface tested in PAO 10 shows the highest 
wear loss, with a wider wear track and deep scratches; the flat surface 
tested in 5W30 oil shows a much smaller and narrower wear track. 


rubbing against the balls. The optical micrographs at the top showing 
the physical condition and difference in wear track size in the three tests. 
The line scans and three-dimensional images below show the extent of 
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Onion-like carbon 


nanocrystals 


d-orbital spacing of 0.341 nm is a typical value for this form of carbon 
structure”*”?, b, HRTEM image showing the amorphous matrix with one 
crystalline domain (yellow box), which corresponds to -MoN. 


Extended Data Figure 8 | Detailed analysis of HRTEM images of the 
debris layer of MoN,-Cu. a, HRTEM image showing nanocrystals of 
8/-MoN and \-Mo,N embedded in an amorphous carbon matrix. There 
were also onion-like carbons in many of the regions examined. The 
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Electron Image 1 


Extended Data Figure 9 | EDS of the debris layer of MoN,—Cu tested in PAO 10. The spectrum shows the presence of molybdenum on the tribofilm. 


Copper cannot be quantified using this method owing the background from the TEM column. On the left is a TEM image showing the area of interest 
from which the spectrum was generated. 
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Evidence for climate change in the satellite 


cloud record 


Joel R. Norris!, Robert J. Allen?, Amato T. Evan!, Mark D. Zelinka*, Christopher W. O’Dell* & Stephen A. Klein? 


Clouds substantially affect Earth’s energy budget by reflecting 
solar radiation back to space and by restricting emission of thermal 
radiation to space'. They are perhaps the largest uncertainty in our 
understanding of climate change, owing to disagreement among 
climate models and observational datasets over what cloud changes 
have occurred during recent decades and will occur in response to 
global warming”. This is because observational systems originally 
designed for monitoring weather have lacked sufficient stability 
to detect cloud changes reliably over decades unless they have 
been corrected to remove artefacts*°. Here we show that several 
independent, empirically corrected satellite records exhibit large- 
scale patterns of cloud change between the 1980s and the 2000s 
that are similar to those produced by model simulations of climate 
with recent historical external radiative forcing. Observed and 
simulated cloud change patterns are consistent with poleward 
retreat of mid-latitude storm tracks, expansion of subtropical dry 
zones, and increasing height of the highest cloud tops at all latitudes. 
The primary drivers of these cloud changes appear to be increasing 
greenhouse gas concentrations and a recovery from volcanic 
radiative cooling. These results indicate that the cloud changes 
most consistently predicted by global climate models are currently 
occurring in nature. 

The International Satellite Cloud Climatology Project (ISCCP) data- 
set and the Extended Pathfinder Atmospheres (PATMOS-x) dataset 
are the two longest satellite records of cloudiness®’. The datasets con- 
sist of cloud retrievals made by multiple weather satellites over several 
decades, and in their original form, the long-term records suffer from 
spurious variability related to changes in satellite orbit, instrument cali- 
bration, and other factors***, Previous studies using these datasets to 
investigate the cloud response to global warming have obtained incon- 
clusive results owing to the dominating presence of artefacts” !!. Here 
we use corrected versions of ISCCP and PATMOS-x from which spu- 
rious variability has been removed by empirically subtracting all cloud 
variability resembling an artefact®. Since one major artefact appears 
as coherent spurious cloud changes across the entire area viewed by a 
satellite, the correction procedure unfortunately also removes any real 
cloud variability at near-global scales, thus precluding examination of 
global mean cloud changes. Instead, we examined large-scale patterns 
of observed cloud change for consistency with patterns projected by 
global climate models to occur with climate change’®!*. For corrobo- 
ration of the corrected ISCCP and PATMOS-x records, we additionally 
investigated the change in albedo observed in the 1980s by the Earth 
Radiation Budget Satellite (ERBS)’* and in the 2000s by the Clouds 
and Earth’s Radiant Energy System (CERES) satellite instruments'*’°, 
and changes in the ocean-only liquid water path from the Multi-Sensor 
Advanced Climatology of Liquid Water Path (MAC-LWP) dataset’®. 

Figure 1a displays the spatial distribution of trends during the period 
1983-2009 for the averaged ISCCP and PATMOS-x total cloud amount, 
and Fig. 1b displays the spatial distribution of differences between the 


2002-2014 CERES albedo and the 1985-1989 ERBS albedo. All obser- 
vational records agree that cloud amount and albedo increased over the 
northwest Indian Ocean, the northwest and southwest tropical Pacific 
Ocean, and north of the Equator in the Pacific and Atlantic oceans. 
Cloud amount and albedo decreased over mid-latitude oceans in both 
hemispheres (especially over the North Atlantic), over the southeast 
Indian Ocean, and in a northwest-to-southeast line stretching across 
the central tropical South Pacific. MAC-LWP exhibits a similar trend 
pattern in liquid water path during the period 1988-2014 (Extended 
Data Fig. 1a). Shifting the start or end time of trend calculation by 
several years has little impact on the spatial pattern of change. Similar 
patterns occur for differences in total cloud amount and albedo 
between the periods 1985-1989 and 2003-2009, during which 
ISCCP, PATMOS-x, and ERBS/CERES completely overlap (Extended 
Data Fig. 2). 

Are the observed cloud changes solely a manifestation of natural 
internal variability or are they also a response to external radiative forc- 
ing of the climate system? We addressed this question by examining 
simulations from the Coupled Model Intercomparison Project Phase 
5 (CMIP5) multi-model dataset!’. Historical simulations included 
anthropogenic greenhouse gas concentrations, ozone, land-use 
changes, anthropogenic aerosols, volcanic aerosols, and solar output 
and thus represent our best estimate of the climate response to recent 
external radiative forcing (Extended Data Table 1). Figure 1c displays 
the spatial distribution of trends in ensemble mean modelled total 
cloud amount during the 27-year period 1983-2009 for all radiative 
forcings (ALL). Observations and models exhibit widespread agree- 
ment on which areas have increasing and decreasing cloud amount 
(Fig. 1d). Table 1 lists the spatial correlation between observed and 
simulated cloud trends. 

Could natural internal variability alone produce the correlation 
between the observed and simulated cloud trend patterns? We assessed 
the likelihood of this outcome by examining cloud trends during 
27-year periods from CMIP5 preindustrial simulations without exter- 
nal radiative forcing (Extended Data Table 2). Calculating the spatial 
correlation between the ensemble mean ALL trend pattern and the 
trend pattern of each 27-year preindustrial period generates a prob- 
ability distribution of correlation values arising purely from natural 
internal variability (Extended Data Fig. 3). We found that no 27-year 
period in more than 15,000 years of preindustrial simulations exhibits 
a correlation coefficient as positive as that between the observed and 
ensemble mean ALL trend patterns, suggesting that external radiative 
forcing was a driving factor in large-scale cloud changes from the 1980s 
to the 2000s. 

One prominent feature of Fig. 1, and a robust prediction by cli- 
mate models, is the widespread reduction in cloudiness at mid- 
latitudes” !0!218, Figure 2a, b shows trends in zonal mean total cloud 
amount during the period 1983-2009 for ISCCP and PATMOS-x, and 
Fig. 2c shows zonal mean differences between the 2002-2014 CERES 
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Figure 1 | Change in observed and simulated cloud amount and albedo 
between the 1980s and 2000s. a, Trend in average of PATMOS-x and 
ISCCP total cloud amount 1983-2009. b, Change in albedo from January 
1985-December 1989 (ERBS) to July 2002-June 2014 (CERES). c, Trend 
in ensemble mean total cloud amount 1983-2009 from CMIP5 historical 
simulations with all radiative forcings (ALL). d, Locations where majority 
of observations and majority of simulations show increases (blue) or 
decreases (orange). Black dots indicate agreement among all three satellite 
records on sign of change in a and b and trend statistical significance 

(P < 0.05 two-sided) in c. All trends and changes are relative to the 

60° S—60° N mean change. 


albedo and the 1985-1989 ERBS albedo. Every observational record 
exhibits a decline in cloud amount or albedo at mid-latitudes in both 
hemispheres that is nearly always statistically significant. The ocean- 
only MAC-LWP dataset also reports less liquid water path around 40° N 
and 40° S (Extended Data Fig. 1b). Previous research found evi- 
dence for tropical expansion in recent decades'®. Reduced cloudiness 
around 40° N and 40° S is consistent with a poleward expansion of 


Table 1 | Correlation between observed and modelled cloud trend 
patterns 


Forcing type 


Spatial pattern ALL GHG AA OZ NAT 
Grid box total 0.39 (0.0001) 0.21(0.05) 0.00 0.00 0.26 (0.03) 
cloud amount [0.003] [0.08] [0.04] 
Zonal mean total 0.80 (0.002) 0.62 (0.008) —0.35 0.27 0.69 (0.03) 
cloud amount [0.009] [0.06] [0.03] 
Zonal meancloud 0.76(0.003) 0.73 (0.004) —0.62 0.73 (0.003) 
amount in the [0.03] [0.04] [0.04] 


50-180 hPa and 
180-320 hPa 
intervals 


Parentheses and square brackets indicate one-sided P values obtained from the preindustria 
simulations shown in Extended Data Fig. 3 and from formal significance tests, respectively. 
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Figure 2 | Zonal mean change in observed and simulated cloud amount 
and albedo between the 1980s and the 2000s. a, Trend in ISCCP total cloud 
amount 1983-2009. b, Trend in PATMOS-x total cloud amount 1983-2009. 
c, Change in albedo from January 1985-December 1989 (ERBS) to July 
2002-June 2014 (CERES). d, Trend in ensemble mean total cloud amount 
1983-2009 from CMIP5 historical simulations with all radiative forcings 
(ALL). Zonal mean climatology is dotted, linear trend or change is solid, 
circles indicate trend statistical significance (P < 0.05 two-sided), and bars 
indicate the interquartile range of individual simulations. All trends and 
changes are relative to the 60° S-60° N mean change. 


the subtropical dry zone cloud minimum and poleward retreat of the 
storm-track cloud maximum. 

Figure 2d displays trends in zonal mean total cloud amount dur- 
ing the period 1983-2009 from the ALL simulations. Most individual 
simulations exhibit reduced cloud amount in the mid-latitudes of both 
hemispheres, and the ensemble mean trends are statistically significant 
(P< 0.05 two-sided). Furthermore, the majority of simulations repro- 
duce the observed increase in cloud amount and albedo occurring in 
the northern tropics. The spatial correlation between observed and 
simulated zonal cloud trends is highly significant (Table 1 and Extended 
Data Fig. 3). 

Since the correction procedures applied to the satellite datasets 
removed any real global mean change that might be present, for maxi- 
mum comparability we subtracted the 60° S-60° N average change in 
total cloud amount from the model output before creating Figs 1c and d, 
and 2d. Without this adjustment, the ALL ensemble mean cloud 
amount averaged over 60° S-60° N decreases by 0.13% over 25 years. 
Although highly statistically significant (P < 0.0001 two-sided), the 
modelled reduction in 60° S-60° N average cloud amount during the 
period 1983-2009 is far smaller than what is detectable by our obser- 
vational systems. Extended Data Fig. 4a and b shows ALL cloud trends 
without the subtraction of the 60° S-60° N average change. They exhibit 
patterns similar to those seen in Figs 1c and 2d. 
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Figure 3 | Zonal mean change in observed and simulated cloud amount 
during the period 1983-2009 in seven pressure intervals. a, ISCCP 
climatological cloud amount. b, Trend in ISCCP cloud amount 1983-2009. 
c, Trend in PATMOS-x cloud amount 1983-2009. d, Trend in ensemble 
mean cloud amount 1983-2009 from CMIP5 historical simulations with 
all radiative forcings (ALL). For ISCCP and PATMOS-x, only the amount 
of clouds with optical thickness 7 > 3.6 is plotted. Black dots indicate trend 
statistical significance (P < 0.05 two-sided). All trends are relative to the 
60° S-60° N mean trend for that pressure interval. 


Another robust prediction by climate models is the rising height 
of high cloud tops at all latitudes!®!*8, Figure 3a displays ISCCP 
climatological zonal mean cloud amount within 7 cloud top pressure 
intervals. Only amounts of clouds with optical thickness greater than 3.6 
are plotted to reduce uncertainty in cloud-top pressure retrievals. A local 
maximum in cloud amount occurs in the 180-310 hPa interval in the 
tropics, whereas clouds are typically no higher than 310 hPa in the 
mid-latitude storm tracks, following the latitudinal variation of 
tropopause height. Figure 3b, c shows that the ISCCP and PATMOS-x 
zonal mean cloud amounts increased in the 50-180 hPa interval and 
decreased in the 180-310 hPa interval during the period 1983-2009 in 
the tropics, consistent with a rise in the tops of the highest clouds. The 
increase in cloud amount in the 180-310 hPa interval at mid-latitudes 
similarly suggests a rise in the highest cloud tops. 

Figure 3d displays trends in zonal mean cloud amount during the 
period 1983-2009 from the ALL ensemble mean. The pattern of mod- 
elled cloud trends is highly correlated with the satellite record in the 
50-180 hPa and 180-310 hPa intervals, suggesting that the observed 
rise in cloud top is at least partly due to external radiative forcing. 


LETTER 


We expect less agreement below these levels because the ISCCP and 
PATMOS-x satellite retrievals cannot detect lower clouds underneath 
higher clouds, whereas the models report the exact cloud amount at 
each level. We note that the negative trends in cloud amount occurring 
in the 50-180 hPa interval at higher latitudes are relative to the 60° 
S-60° N average cloud change for that interval and do not correspond 
to an actual reduction in cloudiness. Extended Data Fig. 4c, for 
which the 60° S-60° N average change was not subtracted, shows that 
modelled cloud amount in the 50-180 hPa interval merely increases 
less at higher latitudes than at lower latitudes. 

What specific factors are contributing to the observed cloud changes? 
We addressed this question by examining the additional CMIP5 simu- 
lations listed in Extended Data Table 1 with external radiative forcing 
only from greenhouse gases (GHG), only from anthropogenic aerosol 
(AA), only from ozone (OZ), and only from natural solar variations 
and volcanic aerosol (NAT). Extended Data Figs 5, 6 and 7 display the 
ensemble mean modelled cloud trends for GHG, AA, OZ and NAT 
simulations. The GHG and NAT simulations both produce mod- 
elled cloud trend patterns that are significantly correlated with the 
observed cloud trend pattern (Table 1 and Extended Data Fig. 3). This 
includes decreasing total cloud amount at mid-latitudes (GHG and 
NAT), increasing total cloud amount in the northern tropics (NAT), 
and increasing cloud amount in the 50-180 hPa interval in the tropics 
and in the 180-310 hPa interval at mid-latitudes (GHG and NAT). 
In contrast, the AA and OZ simulations do not produce cloud trends 
that globally resemble the observed cloud trends, as demonstrated by 
insufficiently positive correlation (Table 1). The OZ simulations do 
exhibit reduced cloud amount at Southern Hemisphere mid-latitudes”. 

Both the GHG and the NAT simulations experience increasing trop- 
ospheric temperature and decreasing stratospheric temperature from 
the 1980s and the 2000s. This is caused by increasing greenhouse gases 
in the former case and a recovery from the 1982 El Chichén and 1991 
Pinatubo volcanic aerosol episodes in the latter case?” 4. Tropospheric 
warming and stratospheric cooling promote an increase in the height 
of the highest cloud tops”>”®, and together with global warming, pro- 
mote an expansion of the tropical zone and a poleward shift of storm 
tracks*”, Depleted stratospheric ozone is an additional factor pro- 
moting a poleward shift of the Southern Hemisphere storm track”!”’. 

The expansion of subtropical dry zones results in less reflection of 
solar radiation back to space. As cloud tops rise, their greenhouse effect 
becomes stronger. Both of these cloud changes have a warming effect 
on climate. Our results suggest that radiative forcing by a combination 
of anthropogenic greenhouse gases and volcanic aerosol has produced 
observed cloud changes during the past several decades that exert pos- 
itive feedbacks on the climate system. We expect that increasing green- 
house gases will cause these cloud trends to continue in the future, 
unless offset by unpredictable large volcanic eruptions. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Satellite datasets. ISCCP provides values of cloud amount in seven cloud top 
pressure intervals and six optical thickness intervals (that is, cloud amount for each 
of 42 ‘cloud types’) from July 1983 to December 2009 (ref. 6). Total cloud amount 
is the sum over all intervals. Cloud-top pressure is most accurately identified when 
clouds are nearly opaque at thermal infrared wavelengths. This occurs when cloud 
optical thickness at visible wavelengths is greater than 3.6. Geostationary satellites 
are the primary contributors to the ISCCP cloud record. We downloaded ISCCP 
D1 data from the Atmospheric Science Data Center located at NASA Langley 
Research Center and applied a correction procedure to remove spurious variability 
associated with changes in satellite orbits, satellite calibration and ancillary data?. 
We note that the correction procedure removes any real global mean cloud varia- 
bility, so all trends presented in this study are with respect to an unknown global 
mean trend, which could be zero. The present study uses only daytime observa- 
tions (defined as solar zenith angle <78°) because visible radiances are required 
to retrieve cloud optical thickness. We found that trends in total cloud amount 
retrieved from day and night infrared radiances are very similar to trends in total 
cloud amount retrieved from daytime visual+-infrared radiances. 

PATMOS-x provides cloud amount in seven cloud-top pressure intervals and 
six optical thickness intervals starting in October 1981 (ref. 7). The present study 
uses data from January 1983 to December 2009 for consistency with ISCCP. The 
total cloud amount is the sum over all intervals. Polar-orbiting satellites are the 
only contributors to the PATMOS-x cloud record. We downloaded PATMOS-x 
Version 5 Level 3 ‘GEWEX data from https://cimss.ssec.wisc.edu/patmosx/access. 
html and applied a correction procedure to remove spurious variability associated 
with changes in satellite orbits, satellite calibration, and ancillary data®. As for 
ISCCP, the correction procedure removes any real global mean cloud variability. 
For consistency over the entire PATMOS-x record, we use products retrieved only 
from the daytime pass of the ‘afternoon satellites. 

The passive remote sensing techniques employed by ISCCP and PATMOS-x 
can have difficulty identifying the occurrence of optically thin cirrus overlying 
optically thick lower cloud and can underestimate the height of the cloud top 
when cloud particle density is sparse in the upper portion of an optically thick 
cloud*”*". ISCCP suffers more from remote sensing limitations than PATMOS-x 
since the latter dataset uses more wavelengths and thus has more information 
available to retrieve cloud properties’. The result is a downward bias in reported 
cloud-top height relative to that obtained from active remote-sensing techniques, 
for which only a short record is available. ISCCP correspondingly underestimates 
the cloud amount in the 50-180 hPa and 180-310 hPa pressure intervals compared 
to active remote sensing. Despite the bias, a real increase over time in the height 
of the highest cloud tops will nonetheless be reported by ISCCP as an increase in 
the amount of cloudiness in the higher elevation pressure interval and a decrease 
in the lower elevation pressure intervals. The bias may produce an underestimate 
in the magnitude of cloud trends since ISCCP climatological cloud amount is 
underestimated, but this does not undermine our analysis because we compare 
the relative spatial patterns of observed and modelled cloud change rather than 
the absolute magnitudes of observed and modelled cloud change. 

Albedo is a useful parameter for our investigation because variability in cloud 
amount is by far the dominant contributor to variability in albedo outside ice- 
covered regions. Variability in cloud optical thickness is a secondary contributor. 
ERBS albedo values are available for November 1984 through February 1990, but 
we use the January 1985-December 1989 climatology for simplicity!’. We also use 
measurements from ERBS only because the other two satellites contributing to the 
Earth Radiation Budget Experiment (NOAA9 and NOAA10) were not available 
for the entire period. We note that ERBS was in a precessing orbit and sampled the 
entire diurnal cycle. CERES Energy Balanced And Filled (EBAF) Ed2.8 reflected 
solar radiation values are available from the morning satellite Terra starting in 
March 2000 and from the afternoon satellite Aqua starting in July 2002 (ref. 14). 
Since Terra and Aqua are in Sun-synchronous orbits CERES EBAF uses geo- 
stationary satellites to fill out the diurnal cycle of cloudiness not sampled by Terra 
and Aqua!*. To ensure consistent sampling of the diurnal cycle and seasonal cycle, 
we constructed a climatology for July 2002—June 2014. We then divided reflected 
solar radiation by incoming solar radiation at the top of the atmosphere to calculate 
albedo. ERBS data were obtained from a CD-ROM provided by the Atmospheric 
Science Data Center located at NASA Langley Research Center, and CERES data 
were obtained from the NASA Langley Research Center CERES ordering tool at 
(http://ceres.larc.nasa.gov/). Although the datasets are individually well calibrated, 
there is no absolute calibration between ERBS and CERES. To bring them to a 
common reference point, we multiplied the ERBS albedo by a constant factor so 
that ERBS and CERES had the same climatological annual albedo averaged over 
60° S-60° N. This means that CERES-ERBS differences are relative to an unknown 
global mean difference that could be zero. 
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Version 4 of the MAC-LWP dataset for January 1988-December 2014 provides 

a useful complement to measurements of cloud amount and albedo!®. The MAC- 
LWP dataset synthesizes passive microwave retrievals from 12 different sensors 
using the Remote Sensing Systems version 7 ocean algorithm”. Liquid water path 
is the spatially averaged vertically integrated amount of cloud liquid water within a 
satellite footprint. Cloud-free areas contribute a value of zero to the spatial average 
within the footprint. Liquid water path increases as clouds become more hori- 
zontally extensive (that is, as cloud amount increases). It also increases as clouds 
become vertically thicker and as cloud water concentration becomes larger. The 
dataset does not include contributions from cloud ice, and retrievals are available 
only over open ocean. To provide a similar basis for comparison to the ISCCP, 
PATMOS-x, and ERBS/CERES datasets, from which global mean variability was 
removed in the correction and adjustment process, we subtracted the 60° S-60° N 
average liquid water path from the value at each grid box for each month. This has 
little impact on the spatial distribution of trends. 
CMIP5 simulations. The CMIP5 multi-model dataset provides a large number 
of global climate model simulations for various forcing scenarios!”. The historical 
simulations span ~1850-2005 and include time-varying radiative forcings such 
as greenhouse gases, ozone, anthropogenic and volcanic aerosols, solar output, 
and land use changes (ALL). We extended the CMIP5 ALL simulations beyond 
their nominal ending year of 2005 by adding follow-on years through to 2009 with 
radiative forcing from representative concentration pathway 4.5 (or if not available, 
the historical extended experiment or representative concentration pathway 8.5). 
Total cloud amount is available from 107 realizations from 33 models, and cloud 
amount in each vertical layer is available from 76 realizations from 27 models 
(Extended Data Table 1). We calculated the ensemble mean as a simple average 
of all available realizations. Some models provided only one realization and other 
models provided up to ten realizations for the same external forcing. Natural inter- 
nal variability across the simulations tends to cancel in the ensemble mean, leaving 
behind the radiatively forced component of cloud change. The ensemble mean has 
smaller trend amplitudes than any one realization or the observations due to this 
reduction of natural internal variability. 

A smaller set of CMIP5 models provided additional simulations with external 
radiative forcing only from anthropogenic greenhouse gases (GHG), only from 
anthropogenic aerosol (AA), only from ozone (OZ), and only from natural solar 
variability and volcanic aerosol (NAT) (Extended Data Table 1). A few models 
included ozone variability in GHG simulations, but we excluded these from our 
analysis to avoid confusion about forcing factors. Total cloud amount is available 
from 44 realizations from 14 models for GHG, from 33 realizations from 11 models 
for AA, from 11 realizations from 3 models for OZ, and from 37 realizations 
from 10 models for NAT. Cloud amount in each vertical layer is available from 
35 realizations from 12 models for GHG, 32 realizations from 11 models for AA, 
from 1 realization from 1 model for OZ, and 28 realizations from 8 models for 
NAT. We did not analyse cloud amount in vertical layers for OZ since only one 
realization was available. 

For many of these models, GHG, AA, OZ, and NAT output was available only 
until 2005. To maximize the number of realizations, we chose the 1979-2005 inter- 
val to calculate trends for GHG and AA since this time period has the same length 
as the ISCCP and PATMOS-x records. The four-year shift for trend calculation 
should not matter for the GHG and AA simulations since greenhouse gas and 
aerosol emissions vary on multidecadal rather than interannual timescales. Timing 
matters more for OZ and NAT because the former includes stabilization of the 
ozone hole in the 2000s and the latter includes volcanic eruptions, so we chose 
only those models providing output for the full 1983-2009 time period. We note 
that the set of contributing models and numbers of realizations is not identical 
for the ALL, GHG, AA, OZ, and NAT simulations. We chose to use all available 
simulations from each forcing scenario because restricting our comparison to only 
those models and numbers of realizations in common would result in a much 
smaller sample size. 

Most CMIP5 models provided multicentury simulations of preindustrial con- 
ditions with no anthropogenic or natural external radiative forcing as a control 
case (Extended Data Table 2). Cloud variability in these simulations results only 
from natural internal variability of the coupled ocean-atmosphere-land system, 
including El Nifio/Southern Oscillation (ENSO). Total cloud amount is available 
for a total of 15,807 years from 27 models, and cloud amount in each vertical layer 
is available for a total of 7,311 years from 13 models. 

Output from CMIP5 simulations was downloaded from the Earth System Grid 
Federation. To provide a similar basis for comparison to the ISCCP, PATMOS-x, 
and ERBS/CERES datasets, from which global mean variability was removed in the 
correction and adjustment process, we subtracted the 60° S-60° N ocean average 
cloud amount from the value at each ocean grid box for each month and subtracted 
the 60° S-60° N land average cloud amount from the value at each land grid box. 
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With the exception of cloud amount in the 50-180 hPa interval, this has little 
impact on the spatial distribution of trends. For zonally averaged cloud amount 
in each vertical layer, we subtracted the 60° S-60° N average. 

Since the CMIP5 models do not routinely report amount of cloudiness in vari- 

ous optical thickness intervals, we could not limit our analysis of modelled clouds 
in Fig. 3 to only those with optical thickness greater than 3.6. Cloud amount from 
the satellite records also differs from standard cloud amount output from CMIP5 
models in that the former do not detect clouds with optical thickness less than 
about 0.3, whereas the latter report the amount of all clouds, even very optically 
thin ones. Another difference is that CMIP5 models report actual cloud amount at 
each model layer, not cloud amount unobscured by higher clouds as do the ISCCP 
and PATMOS-x satellite datasets. For a closer comparison to observations, a few 
CMIP5 models incorporated the Cloud Feedback Model Intercomparison Project 
(CFMIP) Observation Simulator Package (COSP)**. This software produced model 
cloud output according to how it would be detected through the limitations of 
satellite retrieval, most often ISCCP. Since it is computationally expensive, COSP 
cloud output is available only for 13 ALL realizations from seven models (Extended 
Data Table 1). We use standard model cloud output in order to have a larger sample 
size but obtain similar but noisier results if we use COSP cloud output from an 
ISCCP satellite simulator (Extended Data Fig. 8). 
Data analysis. ISCCP, PATMOS-x, and ERBS data on a 2.5° x 2.5° latitude- 
longitude grid and CERES and MAC-LWP data ona 1° x 1° grid were spatially 
averaged to a common 5° x 5° grid. Output from CMIP5 models on a variety of 
grid sizes were bilinearly interpolated to a 5° x 5° grid. We linearly interpolated 
cloud amount from model layers to a common vertical grid with regular 50 hPa 
spacing in pressure. To display modelled cloud trends in the figures, we linearly 
interpolated trends from the 50 hPa regular grid to the midpoints of the seven 
pressure intervals used by the satellite datasets. All calculations are performed on 
anomalies from the long-term mean. In all cases of spatial averaging and spatial 
correlation, grid box values were weighted by the cosine of the grid box centre 
latitude to account for the variation of grid box area with latitude. We restrict our 
analysis to latitudes equatorward of 60° because passive retrieval of cloud proper- 
ties by satellite is difficult over bright and cold surfaces, and no visible retrievals 
can be made during polar night. 

We use least-squares linear trends or the average difference between two time 
periods as convenient means of summarizing change over time. Two-sided P values 
for the trends are determined using a conventional Student's t-test with an effective 
sample size that takes temporal autocorrelation into account. 

For simplicity of comparison with modelled total cloud amount trends in 
calculating correlations in Table 1, we averaged cloud/albedo changes from all 
cloud amount and albedo datasets together before comparing to the ensemble 
mean total cloud amount trends from the CMIP5 ALL, GHG, AA, OZ, NAT and 
preindustrial simulations. Since cloud amount and albedo have different physi- 
cal units, we standardized the grid box changes before averaging. Specifically, we 
divided each ISCCP grid box cloud amount trend by the standard deviation of all 
ISCCP grid box cloud amount trends, each PATMOS-x grid box cloud trend by 
the standard deviation of all PATMOS-x grid box cloud amount trends, and each 
grid box albedo change by the standard deviation of all grid box albedo changes. 
For simplicity of comparison with modelled cloud amount trends in seven pressure 
intervals, we averaged cloud trends from ISCCP and PATMOS-x datasets together. 

Our null hypothesis is that the observed cloud changes result purely from nat- 
ural internal variability. If so, there should be no systematic relationship between 
the spatial pattern of cloud trends generated by natural internal variability and the 
spatial pattern of cloud trends generated by external radiative forcing. The former is 
represented by individual preindustrial simulations, each with different realizations 
of natural internal variability, and the latter is represented by the ensemble mean of 
simulations with external radiative forcing, in which natural variability has been 
largely averaged out. The suitability of this null hypothesis can be demonstrated by 
calculating the distribution of spatial correlation coefficients (Pearson's r) between 
the pattern of cloud trends from the ensemble mean of forced simulations (ALL, 
GHG, or NAT) and the pattern of cloud trends from time periods of similar length 
obtained from preindustrial control simulations. We build up each null distribu- 
tion by calculating cloud trends and spatial correlation values during a rolling 
27-year period (that is, years 1-27, years 2-28, years 3-29, and so on) through the 
preindustrial control simulation for each model. Extended Data Fig. 3 displays the 


frequency distributions of the calculated correlation values. There are 15,104 time 
periods for total cloud amount and 6,973 time periods for cloud amount in vertical 
layers. The mean and median correlation values of the null distributions are zero, 
as expected, and the total area under each frequency distribution is equal to unity. 
Our alternative hypothesis is that external radiative forcing was a contributing 
factor in producing the observed cloud trends. If so, we expect a positive spatial 
correlation between the observed trend pattern and the trend pattern from the 
ensemble mean of simulations with external radiative forcing (values shown as 
vertical lines in Extended Data Fig. 3). The P value for a particular spatial correlation 
value r is simply the fraction of correlation values from the preindustrial control 
simulations with values more positive than r (that is, the fraction of area under the 
frequency distribution to the right of the vertical line). For simplicity, we calculate 
P values with respect to a null distribution built from spatial correlation values for 
cloud trends from single time periods. Another option is to build a null distribution 
from spatial correlation values for cloud trends from ensemble means of multiple, 
randomly-selected time periods. Our results are the same for either approach. 
For corroboration of P values calculated with respect to cloud trend patterns 
from the preindustrial control simulations, we additionally computed one-sided P 
values using a conventional Student's t-test for statistical significance of Pearson's r. 
In this case, a critical parameter is the effective number of spatial degrees of 
freedom**. We determined that there are 51 spatial degrees of freedom between 
60° S and 60° N for the observed total cloud amount in grid boxes. Simplistically 
considered, this corresponds to a set of boxes slightly larger than those outlined 
by the latitude—longitude grid lines in Fig. 1, if apportioned equally. However, 
remote teleconnections contribute to reduced spatial degrees of freedom in addi- 
tion to local spatial coherence. Zonal means have substantially fewer degrees of 
freedom (8 for total cloud amount and 7 for cloud amount in the 50-180 hPa and 
180-310 hPa pressure intervals). This corresponds to about 15° spacing in latitude. 
P values obtained from formal tests are in some cases substantially larger than those 
obtained from the preindustrial control simulations (Table 1 and Extended Data 
Fig. 3), suggesting that the actual number of effective spatial degrees of freedom 
may be larger than that indicated by the method we used™. 
ENSO-like variability. The dominant source of multiyear natural variability in 
the climate system is the El Nifio/Southern Oscillation (ENSO) phenomenon. 
Variability occurring at interdecadal time scales, especially over the Pacific basin, 
exhibits a pattern similar to that of ENSO*. We investigated whether the observed 
cloud changes are a manifestation of ENSO-like variability by calculating the corre- 
lation of the spatial pattern of cloud trends with the spatial pattern of the difference 
between observed La Nifia composite cloud anomalies and El Nifio composite 
cloud anomalies (figure not shown due to space limitations). The correlation 
between the observed La Nifia-El Nifio pattern and the observed trend pattern is 
only 0.13, and the spatial correlation between the observed La Nifta—El Nifo pat- 
tern and the ensemble mean ALL trend pattern is only 0.14. Considering that the 
spatial correlation between the observed trend pattern and the ensemble mean ALL 
trend pattern is 0.39 (Table 1), we think ENSO-like variability cannot be a major 
contributor to the global pattern of cloud change between the 1980s and the 2000s. 
Code availability. All plots and calculations were produced with custom code 
using the NCAR Command Language (NCL) Version 6.3.0 (http://dx.doi. 
org/10.5065/D6WD3XH5S). This code can be obtained by contacting the corre- 
sponding author. 
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Extended Data Figure 1 | Change in observed liquid water path during 
the period 1988-2014. a, Linear trend in MAC-LWP liquid water path 
during January 1988 to December 2014. b, Zonal mean climatology (red) 
and trend (black) in MAC-LWP liquid water path during January 1988- 
December 2014 given as gm? per 25-year period. Circles indicate trend 
statistical significance (P < 0.05 two-sided). All trends are relative to the 
60° S-60° N mean trend given as g m ” per 25 year period. 
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Extended Data Figure 2 | Change in observed cloud amount and 
albedo between January 1985 to December 1989 and January 2003 

to December 2009. a, Change in ISCCP total cloud amount given as a 
percentage amount per 25-year period. b, Change in PATMOS-x total 
cloud amount given as a percentage amount per 25-year period. c, Change 
in ERBS/CERES albedo given as a percentage albedo per 25-year period. 
Black dots indicate agreement among all three satellite records on sign 

of change in a, b and c. All changes are relative to the 60° S-60° N mean 
change. 
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Extended Data Figure 3 | Correlation between forced simulated, 
unforced simulated, and observed cloud trend patterns. Distribution of 
correlation between patterns from multiple unforced CMIP5 preindustrial 
simulations and the ensemble mean of CMIP5 simulations with all 
radiative forcings (ALL, black), only greenhouse radiative forcings 

(GHG, red), and only natural radiative forcings (NAT, blue) for 27-year 
trends. a, Grid box total cloud amount. b, Zonal mean total cloud amount. 


c, Zonal mean cloud amount in the 50-180 hPa and 180-320 hPa intervals. 


Vertical lines indicate correlation between the observed pattern and 
ensemble mean ALL, GHG, or NAT patterns. Fractional area under each 
distribution to the right of the vertical line is the corresponding P value 
in Table 1. 
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Extended Data Figure 4 | Absolute change in simulated cloud amount 
during the period 1983-2009. a, Linear trend in ensemble mean total 
cloud amount given as a percentage amount per 25-year period 1983-2009 
from CMIPS historical simulations with all radiative forcings (ALL). 

Data given as a percentage amount per 25-year period. b, Zonal mean 
climatology (red) and trend (black) in ensemble mean total cloud amount 
1983-2009 from ALL simulations. c, Zonal mean trend in ensemble 

mean cloud amount 1983-2009 in seven pressure intervals from ALL 
simulations. Data given as a percentage amount per 25-year period. Black 
dots and circles indicate trend statistical significance (P < 0.05 two-sided), 
and bars indicate interquartile range of individual simulations. Unlike 
Figs 1c, 2d and 3d, trends are not relative to the 60° S-60° N mean trend. 
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Extended Data Figure 5 | Change in simulated cloud amount between 
the 1980s and 2000s for different types of forcing. Linear trend in 
ensemble mean total cloud amount given as a percentage amount 
per 25-year period. From CMIP5 simulations and locations where the 
majority of observational (Obs) datasets and the majority of simulations 
show increasing (blue) or decreasing (orange) cloud amount or albedo. 
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a, b, Only greenhouse gas (GHG) forcing 1979-2005. c, d, Only 
anthropogenic aerosol (AA) forcing 1979-2005. e, f, Only ozone (OZ) 
forcing 1983-2009. g, h, Only natural (NAT) forcing 1983-2009. Black 
dots indicate trend statistical significance (P < 0.05 two-sided). All trends 
are relative to the 60° S-60° N mean trend. 
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Extended Data Figure 6 | Zonal mean change in simulated cloud 
amount between the 1980s and 2000s for different types of forcing. 
Climatology is red and linear trend is black for ensemble mean total cloud 
amount from CMIP5 simulations. Data given as a percentage amount 

per 25-year period. a, Only greenhouse gas (GHG) forcing 1979-2005. 

b, Only anthropogenic aerosol (AA) forcing 1979-2005. c, Only ozone 
(OZ) forcing 1983-2009. d, Only natural (NAT) forcing 1983-2009. 
Circles indicate trend statistical significance (P < 0.05 two-sided) and 
bars indicate interquartile range of individual simulations. All trends are 
relative to the 60° S-60° N mean trend. 
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Extended Data Figure 7 | Zonal mean change in simulated cloud 
amount between the 1980s and 2000s in seven pressure intervals 


oogoo0SO 
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for different types of forcing. Linear trend for ensemble mean cloud 
amount from CMIP5 simulations. a, Only greenhouse gas (GHG) forcing 
1979-2005. Data given as a percentage amount per 25-year period. b, Only 
anthropogenic aerosol (AA) forcing 1979-2005. c, Only natural (NAT) 


forcing 1983-2009. Black dots indicate trend statistical significance 


(P < 0.05 two-sided). All trends are relative to the 60° S-60° N mean trend 


for that pressure interval. 
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Extended Data Figure 8 | Change in simulated cloud amount during 
the period 1983-2009 from COSP. a, Linear trend in ensemble mean 
total cloud amount 1983-2009 from CMIP5 historical simulations with all 
radiative forcings (ALL). Data given as a percentage amount per 25-year 
period. b, Zonal mean climatology (red) and trend (black) in ensemble 
mean total cloud amount 1983-2009 from ALL simulations. c, Zonal 
mean trend in ensemble mean cloud amount 1983-2009 in seven pressure 
intervals from ALL simulations. Black dots and circles indicate trend 
statistical significance (P < 0.05 two-sided), and bars indicate interquartile 
range of individual simulations. All trends relative to the 60° S-60° N 
mean trend. 
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Extended Data Table 1 | CMIP5 models and number of simulations 
used for each forcing experiment 


Model — GHG AA OZ NAT 
Standard COSP 

ACCESS1-0 (1) 

ACCESS1-3 (1) 

BCC-CSM1-1 3 1 | 
BCC-CSM1-1-m 3 

BNU-ESM 1 1 
CanESM2 5 1 5 5 5 
CCSM4 6 3 3 
CESM1-CAM5 3 
CESM1-CAM5-1-FV2 1 2 Z 
CESM1-WACCM 3 
CMCC-CM 1 
CMCC-CMS 1 
CNRM-CM5 (10) (5) (5) 
CSIRO-Mk3-6-0 10 5 5 5 
EC-EARTH (6) 
FGOALS-g2 1 2 4 
GFDL-CM3 ql 3 
GFDL-ESM2G 1 
GFDL-ESM2M 1 1 1 
GISS-E2-H 5 5 5 (5) 5 
GISS-E2-H-CC 1 
GISS-E2-R 5 5 5 (5) 5 
GISS-E2-R-CC 
HadCM3 (10) 

HadGEM2-ES (4) 1 (4) (4) 
IPSL-CM5A-LR 4 =] 1 3 
IPSL-CM5A-MR 1 3 3 

MIROCS5 5 5 
MIROC-ESM 1 3 
MIROC-ESM-CHEM 1 1 
MPI-ESM-LR 3 1 
MPI-ESM-MR 3 
MRI-CGCM3 3 1 
NorESM1-M 3 1 1 


Parentheses indicate unavailability of vertical distribution of cloud amount. Empty cells indicate 
unavailability of model output for that experiment. 
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Extended Data Table 2 | CMIP5 models and number of years in each 
preindustrial control simulation 


Model 
ACCESS1-0 
ACCESS1-3 
BCC-CSM1-1 
BCC-CSM1-1-m 
BNU-ESM 
CanESM2 
CCSM4 
CESM1-CAM5 
CESM1-WACCM 
CNRM-CM5 
CSIRO-Mk3-6-0 
FGOALS-g2 
GFDL-CM3 
GFDL-ESM2G 
GFDL-ESM2M 
GISS-E2-H 
GISS-E2-R 
HadGEM2-ES 
IPSL-CM5A-LR 
IPSL-CM5A-MR 
MIROC5 
MIROC-ESM 


MIROC-ESM-CHEM 


MPI-ESM-LR 
MPI-ESM-MR 
MRI-CGCM3 
NorESM1-M 


Total Cloud Amount 


500 
500 
500 
400 
559 
996 
1000 
319 
200 
850 
500 
700 
500 
500 
500 
540 
550 
337 
1000 
300 
670 
630 
255 
1000 
1000 
500 
501 


Layer Cloud Amount 


500 
400 
559 
796 
1000 


500 
500 
500 


670 
630 
255 


500 
501 


Empty cells indicate unavailability of model output for that experiment. 
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A novel excitatory network for the control of 


breathing 


Tatiana M. Anderson!?*, Alfredo J. Garcia III'*, Nathan A. Baertsch!, Julia Pollak!, Jacob C. Bloom!, Aguan D. Wei', 


Karan G. Rai! & Jan-Marino Ramirez! 


Breathing must be tightly coordinated with other behaviours 
such as vocalization, swallowing, and coughing. These behaviours 
occur after inspiration, during a respiratory phase termed 
postinspiration!. Failure to coordinate postinspiration with 
inspiration can result in aspiration pneumonia, the leading cause 
of death in Alzheimer’s disease, Parkinson's disease, dementia, and 
other neurodegenerative diseases”. Here we describe an excitatory 
network that generates the neuronal correlate of postinspiratory 
activity in mice. Glutamatergic-cholinergic neurons form the 
basis of this network, and GABA (y-aminobutyric acid)-mediated 
inhibition establishes the timing and coordination relative to 
inspiration. We refer to this network as the postinspiratory complex 
(PiCo). The PiCo has autonomous rhythm-generating properties 
and is necessary and sufficient for postinspiratory activity in vivo. 
The PiCo also shows distinct responses to neuromodulators when 
compared to other excitatory brainstem networks. On the basis of 
the discovery of the PiCo, we propose that each of the three phases 
of breathing is generated by a distinct excitatory network: the 
pre-Bétzinger complex, which has been linked to inspiration**; the 
PiCo, as described here for the neuronal control of postinspiration; 
and the lateral parafacial region (pF,), which has been associated 
with active expiration, a respiratory phase that is recruited during 
high metabolic demand*”. 

Neurons that are active in phase with postinspiratory activity have 
previously been identified in the Bétzinger complex (BOtC), a region 
that is primarily comprised of inhibitory neurons®’. However, the 
source of excitation that drives this inhibitory network is not well- 
defined. In this study, we have identified the location and neurochemical 
phenotype of an excitatory and rhythmogenic neuronal population that 
is specifically active during postinspiration. 

We developed a horizontal slice preparation in postnatal day (P)5-10 
mice that captures the ventral extent of the medulla, including the 
ventral respiratory column’ (VRC; Fig. 1a and Extended Data Fig. 1), 
and recorded extracellular, bilaterally synchronized respiratory rhyth- 
mic population activity. Inspiratory population activity was identified 
within the pre-Bétzinger complex? (preBotC; Fig. 1a), a network that 
is necessary and sufficient for generating inspiration*’®. Horizontal 
slices also generated postinspiratory population activity that discharged 
immediately after, but never during, inspiratory activity, and followed 
sighs generated in the preBétC!! (Fig. 1a). 

Postinspiratory bursts occurred spontaneously on average after 1 out 
of 12 preBotC bursts in the horizontal slice (Fig. la and Extended Data 
Fig. 2). The decreased excitability of postinspiratory activity compared 
to preBotC activity could be due to the absence of the pons, which, 
in vivo, provides descending neuromodulatory inputs, including nor- 
epinephrine, to the medulla'”. Indeed, postinspiratory activity in our 
slice preparation was exquisitely sensitive to norepinephrine. In the 
presence of 21M norepinephrine, postinspiratory population activity 


occurred with nearly every inspiratory cycle (Fig. 1a and Extended Data 
Fig. 2). Therefore, we used this norepinephrine concentration as a tool 
to facilitate postinspiratory activity in vitro. 

Postinspiratory population activity was most pronounced approx- 
imately 400 |.m rostral to the preB6tC, dorsal to the BotC, and caudal 
to the facial (VII) nucleus. We refer to this area as the postinspiratory 
complex (PiCo; Fig. la and Extended Data Fig. 1). To assess the distri- 
bution of postinspiratory activity, we positioned one electrode in the 
PiCo region and a second electrode contralaterally to map the ampli- 
tude of postinspiratory population activity across the VRC (Fig. 1a). 
Postinspiratory activity was concentrated rostral to the preBotC, but 
extended caudally and partially overlapped with inspiratory activity. 

We anatomically characterized the PiCo by immunohistological 
labelling of transverse sections (Fig. 1b, left), revealing that choline 
acetyltransferase (ChAT)-positive cholinergic and vesicular glutamate 
transporter 2 (Vglut2; also known as Slc17a6)-expressing glutamater- 
gic neurons in Vglut2-cre;Ai6 cre reporter mice co-localized in the 
PiCo, dorsomedial to the nucleus ambiguus (Fig. 1b, right). By contrast, 
ChAT-positive neurons in the nucleus ambiguus lacked substantial 
Vglut2 expression (Extended Data Fig. 3). In the sagittal plane, PiCo 
neurons that were co-labelled with ChAT and Vglut2 were located dor- 
sal and caudal to the VII nucleus (Fig. 1c, left). Quantification of ChAT 
and Vglut2 co-expression revealed that the PiCo mainly extends from 
40 to 280 1m medial to the nucleus ambiguus and —50 to 250 1m caudal 
to the VII nucleus caudal border (Fig. 1c, right). In situ hybridization 
confirmed that Vglut2 mRNA was expressed in Chat-derived PiCo 
neurons in transverse sections from Chat-cre;Ail4 mice (Fig. 1d, e). 

The Cre-dependent reporter line Ai27, which conditionally expresses 
channelrhodopsin-2 (ChR2) fused to td-Tomato in the presence of a 
selective, promoter-driven Cre, allowed photo-stimulation of specific 
neuronal sub-populations!’. We recorded from PiCo neurons in 
Veglut2-cre;Ai27 and Chat-cre;Ai27 horizontal slices. Membrane depo- 
larization of tetrodotoxin (TTX)-isolated PiCo neurons during light 
stimulation demonstrated that functionally identified postinspiratory 
cells were glutamatergic and cholinergic (Fig. 2a), consistent with the 
histological results. PiCo neurons generated neither pre-inspiratory 
bursts nor the biphasic discharge typical of pre-inspiratory neurons in 
the retrotrapezoidal nucleus parafacial respiratory group (RTN/pFRG) 
region! 

Postinspiratory population activity was unaffected by the use of 
bath-applied strychnine to block glycinergic inhibition (Fig. 2b and 
Extended Data Fig. 4). However, PiCo and preBotC bursts progressively 
synchronized following blockade of GABAergic inhibition with gaba- 
zine, in the presence (Fig. 2b and Extended Data Fig. 5) or absence (data 
not shown) of strychnine. The burst area of postinspiratory activity 
was increased when synaptic inhibition was blocked (Extended Data 
Fig. 4), indicating that the PiCo rhythm is modulated, but not gener- 
ated, by inhibitory mechanisms. Inspiratory and postinspiratory bursts 


1Seattle Children’s Research Institute, Center for Integrative Brain Research, Seattle, Washington 98101, USA. 2University of Washington School of Medicine, Graduate Program in Neuroscience, 
Seattle, Washington 98195, USA. University of Washington School of Medicine, Department of Neurological Surgery, Seattle, Washington 98105, USA. 


*These authors contributed equally to this work. 


76 | NATURE | VOL 536 | 4 AUGUST 2016 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved 


a Horizontal slice 


fo) 
8 VIIN = 
hose oO ma Ne 
g aa Ji 
Fe} => 
2 AGIA. : Ts 
a NA is 
= Fictive 
2 uM NE post- 
inspiration 
Q 
| 9 
ao 
— 
xe) 
51 2 Fictive 
° a sigh 
o 
£| § 
ao} Q 
= a —= 
1s 1s 


b Transverse section 


Vglut2-cre;Ai6/Gre/ChAT/DAPI 


Sa 


Bregma -6.64 


c Sagittal section 


Gre (Vglut2-cre;Ai1 4)/ 
ChAT 


Chat-cre;Ai14/ 


“© ChATt *ChAT*/Vglut2* 


a 
2: 


Total cells 
no Lb 
Oo 


Medial distance from medial NA (um) 
-*ChAT* -* ChAT*/Vglut2* 


2 30 
8 
= 20 
xo) 
5 é F 10 
sf mae 50) 

ChAT > "SSS SSS 8 

NA+120um 500 um Selsey ee 

Lateral 0.96 mm ——— = Caudal distance from caudal VII N (um) 


Figure 1 | Horizontal slice and anatomy of the PiCo. a, Population 
bursts in PiCo (black), preBotC (purple). Norepinephrine (NE, 2 1M) 
stimulates PiCo bursts (n = 23). Left, schematic; right, heat map of PiCo 
burst amplitude (1 = 6). PiCo bursts follow fictive sighs. b, Left, 
immunohistochemical labelling of PiCo dorsomedial to nucleus 
ambiguous (NA). Middle, higher magnification showing localization 

of ChAT and Vglut2 (ZsGreen1) within the PiCo of Vglut2-cre;Ai6 

mice. Right, arrows: higher magnification of triple-labelled ChAT* 
ZsGreen1* Cre* PiCo cells (merged image above, individual images 
below; n=5). c, Left, ChAT* PiCo neurons (dashed box) in sagittal view. 
Inset, magnification of dashed box showing ChAT* Cret PiCo neurons 
in Vglut2-cre;Ail4 mice. Right, quantification of ChAT- and Vglut2- 
expressing PiCo cells mediolaterally from NA (top; 244.7 + 31.3 average 
total cells; n = 5) and rostrocaudally from VII N (bottom; n = 4). Data 
shown as mean + s.e.m. d, ChAT* PiCo neurons in Chat-cre;Ail4 mice. 
e, Magnification of box in d. Arrowheads, Vglut2 mRNA in Chat-derived 
PiCo neurons. 


persisted following NMDA receptor blockade with 3-(2-carboxypiper- 
azin-4-yl)propyl-1-phosphonic acid (CPP; data not shown), but burst- 
ing was abolished following non-NMDA glutamatergic blockade with 
6-cyano-7-nitroquinoxaline-2,3-dione (CNQX; Fig. 2b). We conclude 
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that inspiratory and postinspiratory activities are generated by gluta- 
matergic, non-NMDA dependent mechanisms, whereas the timing 
of inspiratory and postinspiratory bursts is established by GABAergic 
mechanisms. 

Inspiratory rhythm-generating neurons in the preB6tC are derived 
from Dbx1-expressing progenitor cells. To investigate whether these 
neurons interact with PiCo, we used a tamoxifen-inducible transgenic 
line (Dbx1-cre-ERT2;Ai27) in which Dbx1-positive cells born after 
embryonic day (E)10.5 express channelrhodopsin-2. Photostimulation 
of preBotC neurons in horizontal slices from Dbx1-cre-ERT2;Ai27 
mice inhibited all recorded PiCo neurons in the presence of 2 |1M nor- 
epinephrine alone and after the addition of strychnine (Fig. 2c). In 
the presence of gabazine, this light-evoked inhibition was eliminated. 
The blockade of GABAergic inhibition revealed that PiCo neurons 
also received excitatory input from the preB6tC, which was blocked 
by CNQX (Fig. 2c). Thus, inspiratory activity involving Dbx1-derived 
neurons simultaneously excites and inhibits PiCo neurons via gluta- 
matergic and GABAergic mechanisms, respectively; however, under 
normal conditions, GABAergic input dominates over the concurrent 
glutamatergic excitation from the preBotC (Fig. 2c). 

Because PiCo neurons co-express acetylcholine and glutamate (Fig. 1), 
we tested whether the postinspiratory rhythm depended on cholinergic 
mechanisms. Atropine, a muscarinic receptor antagonist, but not 
mecamylamine, a nicotinic receptor antagonist, depressed postinspir- 
atory burst frequency. However, postinspiratory bursting persisted in 
the presence of both blockers and returned to near baseline frequency 
when the concentration of norepinephrine was increased to 41M 
(Extended Data Fig. 6). Thus, the PiCo rhythm is modulated by, but 
not dependent on, cholinergic mechanisms. 

PiCo neurons were intrinsically sensitive to the 1.-opioid receptor 
agonist (D-Ala?, N-Me-Phe?, Gly°-ol)-Enkephalin (DAMGO) the 
(Extended Data Fig. 7). In horizontal slices, PiCo population bursts 
were nearly eliminated by treatment with 25 nM DAMGO, whereas 
burst frequency in the preBotC was only slightly decreased (Fig. 2d and 
Extended Data Fig. 8). This exquisite opioid sensitivity unambiguously 
differentiates PiCo from the previously described RTN/pFRG, a region 
that is thought to contain the network that generates active expiration* 
and is known to be insensitive to |1-opioid receptor activation!®. The 
peptide somatostatin (SST) also inhibited postinspiratory PiCo activity, 
but had little effect on preB6tC activity (Fig. 2e and Extended Data 
Fig. 8). These data are consistent with the inhibition of postinspiration 
by SST in vivo'”. 

Optogenetic stimulation of ChAT-positive neurons always elicited 
postinspiratory bursts from the PiCo in horizontal slices. Such stimula- 
tion never evoked an inspiratory burst (Extended Data Fig. 9) or burst 
activity in intracellularly recorded nucleus ambiguus neurons (data not 
shown). Because of this specificity for postinspiratory activity, we used 
adult Chat-cre;Ai27 mice to stimulate the PiCo in vivo. Similar to the 
in vitro results, optogenetic activation of ChAT-positive neurons at the 
level of the PiCo (Fig. 3a—c) reliably evoked bursts in the cervical vagal 
nerve (cVN) (Fig. 3d and Extended Data Fig. 9). Photo-evoked post- 
inspiratory bursts delayed the subsequent inspiration (Fig. 3d, e and 
Extended Data Fig. 9). This delay was eliminated by bilateral injection 
of DAMGO into the PiCo (Extended Data Fig. 10). Thus, postinspi- 
ration has a mutually inhibitory relationship with inspiratory activity. 

To assess whether the PiCo is responsible for generating postin- 
spiratory motor output in vivo, we took advantage of the sensitivity of 
the PiCo to SST and DAMGO. Bilateral injections of SST or DAMGO 
into the PiCo in vivo (Fig. 3a—c) markedly reduced the duration and 
amplitude of spontaneous vagal postinspiratory bursts (Fig. 3f-h). 
Collectively, these results suggest that the PiCo is both necessary and 
sufficient for generating postinspiratory activity. 

Moreover, the PiCo seems to possess autonomous rhythmogenic 
properties. In horizontal slices, norepinephrine concentrations above 
2M generated ectopic PiCo population bursts that outpaced the 
preBotC rhythm (Fig. 4a and Extended Data Fig. 2). Ectopic bursts 
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Horizontal slice 


Figure 2 | Glutamatergic-cholinergic PiCo 
cells and role of synaptic inhibition. 


a, Intracellular PiCo recordings in horizontal 


slices from Vglut2-cre;Ai27 (n = 3) and Chat- 
cre;Ai27 (n=5) mice. Photostimulation after 
treatment with TTX depolarizes the membrane. 
b, Recordings from the PiCo and preBétC 
during progressive synaptic block (n=5). 
Rhythms are unaffected in the presence of 
strychnine, synchronized in gabazine, and 
bursting is abolished by CNQX. I, inspiration; 
PI, postinspiration. c, Recording from a PiCo cell 
in a horizontal slice from Dbx1-—cre-ERT2;Ai27 
mouse during progressive application of synaptic 
blockers. The PiCo cell is inhibited during 
photo-evoked inspiratory bursts in the presence 
of norepinephrine alone or with the addition of 
strychnine, bursts during light stimulation with 
the addition of gabazine, and ceases to burst 
after the application of CNQX (500 ms light, 

40 sweeps, n= 4). d, PiCo bursting is eliminated 
by 25nM DAMGO; representative bursts (n = 5). 
e, PiCo bursting is inhibited by 500 nM SST; 
representative bursts (n = 6). 
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occurred in any phase of the inspiratory cycle except during preBétC 
bursts (Fig. 4b), consistent with photo-evoked PiCo bursts (Extended 
Data Fig. 9). 

To further test the possibility that PiCo is an autonomous rhythm 
generator, we separated the VRC into two adjacent rostral and caudal 
transverse slices (Fig. 4c). Together, these slices span a total length of 
1-1.1 mm of the rostrocaudal VRC, beginning with the caudal portion 


2M NE 


<) 


= 
1 min 


of the VII nucleus. Recordings from the caudal face of each transverse 
slice revealed a slower rhythm in the rostral transverse slice containing 
the PiCo than in the caudal slice containing the preBétC*”! (Fig. 4c). In 
the presence of 21M norepinephrine, both transverse slices exhibited 
regular rhythmic activities with similar burst frequencies (Fig. 4c and 
Extended Data Fig. 2) that persisted with the progressive addition of 
strychnine, gabazine, and CPP, but were abolished by application of 


Ventral surface ChAT photostimulation Bilateral PiCo injection Figure 3 | Stimulation and inhibition of 
d | Light f the PiCo in vivo. a, Ventral sites for bilateral 
k A photostimulation or injection of SST or DAMGO 
z in vivo. Left, brightfield image; right, schematic. 
3 Left, Chat-cre;Ai27; right, brightfield. c, Dye 
7 spread (n = 7), centred at the PiCo (0-200 1pm 
caudal from rostral nucleus ambiguus). Ipsilateral 
= (grey), contralateral (teal) injections, bars + 
Pf & max/min; pooled data in red, mean + s.e.m. 
250 ms Time post injection (min) d, PiCo photostimulation in adult Chat-cre;Ai27 
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— = = rere _ phase; purple bars, inspiratory phase delay). 
- JcVN ~ = S20 e, Quantification of inspiratory phase delay 
. — x = B $15 (n= 6); magnitude depends on stimulus phase 
» | . 5 1.0 (slope: 0.549, linear regression analysis). 
A 05 f, g, Injection of SST or DAMGO progressively 
es R a 0 ae ae a ae a decreases burst duration in the cVN but not the 
c _ Ipsilateral —_ £ 3 = ~ Time post injection (min) x nerve. I, inspiration; PI, postinspiration. 
Contralateral £5 h - PI 7 PI h, Postinspiratory burst duration, amplitude, and 
a4 00+ | -6.54 : s duration amplitude frequency _ frequency following injection of SST (n = 3) or 
z sth ede lee kl pig 3 2 | 0- °7 DAMGO (n=4). Two-tailed paired t-test, 
8 ( mu tt 3 3 2. i 0 T °P < 0.05 compared to baseline; mean + s.e.m. 
< 100+ | 3-674 3 BB os] OQ 6 -20 20-4 -20 
2 g be re=0457 2 % 40 ~40 4 404 
a ‘ L-6.84 3 ; P< 0.0001 o) x 2 _¢0 001 5 -60 
& 3004 L -6.94 ear eee 2 -80 -804 -80 
4004 L_7.04 Stimulus phase S-1004 > -100- -100 
78 | NATURE | VOL 536 | 4 AUGUST 2016 
© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved 


Horizontal slice 


a b 60 4 uM NE outpacing 
50 
8 
z 40 
5 30 
jo} 
— O 20 
Q oe 10 
Fe) 
a 04 
a org? or om © oho’ oP o% o? 


Postinspiration relative to inspiration 


4 uM NE 


Transverse slices 


Spontaneous 2 uM NE 
jo} 
a 
ie} 
3 
faa) 
Q 
a 
5s 


Average peeeeee 8 «6 ee rene 


Vglut2-cre;Ai27 


Chat-cre;Ai27 


Figure 4 | The PiCo is an autonomous, rhythm-generating network. 

a, In the presence of 3-41M norepinephrine, the PiCo rhythm 

outpaces the preBétC rhythm, but the PiCo still produces bursts in the 
postinspiratory phase (grey bars) in horizontal slices. b, PiCo bursts 
occur in any phase except during inspiration (inspiratory peak = 0; 
count = 400 bursts; n = 4). c, Recordings from the PiCo and preBétC 
isolated in transverse in vitro slices; PiCo bursting is stimulated by 2 1M 
norepinephrine (n= 33). d, Light stimulation evokes a burst in rostral 
transverse slices from Vglut2-cre;Ai27 mice (91.3 + 5.1% of stimulations, 
mean + s.e.m., n= 4) and Chat-cre;Ai27 mice (91.9 + 1.8% of 
stimulations, mean + s.e.m., n= 6). 1.5-s light pulse. Top traces: 10 sweep 
overlay; bottom traces: sweep average. 


CNQxX (Extended Data Fig. 4), similar to the findings in horizontal 
slices. Consistent with our histological characterizations, optogenetic 
stimulation of either Vglut2-cre;Ai27 or Chat-cre;Ai27 rostral slices 
evoked population bursts in the PiCo (Fig. 4d). This provides additional 
evidence that glutamatergic—cholinergic neurons are important for 
rhythm generation within the PiCo network. Furthermore, PiCo activ- 
ity was exquisitely sensitive to DAMGO and SST in isolated transverse 
slices (Extended Data Fig. 8). We conclude that the PiCo and preBotC 
can function as independent oscillators with similar rhythm-generating 
but distinct modulatory properties. 

As an excitatory rhythmogenic network, the PiCo may not only be 
involved in the control of breathing, but might also contribute to the 
generation of other postinspiratory behaviours such as swallowing and 
vocalization. Although we did not perform behavioural assays in this 
study, various types of postinspiratory burst waveforms were observed 
in the vagal nerve (Extended Data Fig. 10) that were similarly affected 
by manipulation of the PiCo, supporting the idea that this network 
might have a broad role in controlling postinspiratory activities. In this 
context it will be interesting to resolve the role of the PiCo in specific 
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postinspiratory behaviours and to identify how the PiCo interacts with 
other neural networks such as the Kolliker-Fuse nucleus, a pontine 
structure that has been hypothesized to gate postinspiratory activity’, 
and the periaqueductal grey, a structure involved in vocalization and 
the control of postinspiration!” 

On the basis of our results, we propose a triple oscillator model in 
which the three phases of breathing—inspiration, postinspiration, and 
active expiration—are generated by three spatially distinct excitatory 
rhythmogenic microcircuits, the preBotC, PiCo, and pF, respectively, 
which are temporally coordinated by inhibitory interactions. The 
existence of discrete excitatory networks may facilitate the differential 
and dynamic control of ventilatory and non-ventilatory behaviours. 
Coupled oscillators”° have also been hypothesized to be involved in 
networks controlling locomotion”', scratching”, swimming” and the 
circadian clock”. Thus, this network organization may constitute a 
general principle of rhythm generation that promotes flexible control 
of complex biological processes. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 

Animals. All experiments were performed with the approval of the Institute of 
Animal Care and Use Committee of the Seattle Children’s Research Institute. Mice 
were maintained with rodent diet and water available ad libitum in a vivarium with 
a 12-h light-dark cycle at 22°C. In this study, we used both CD1 Swiss mice and 
Cre reporter mice generated on a C57BL/6 background. Ai27 mice were bred to 
conditionally express Channelrhodopsin-2 (H134R) fused to tdTomato inserted in 
the ROSA26 locus (B6.Cg-Gt(ROSA)26Sor!”?”1(CAG-COP 4: HI134R/td Tomato) Hze/y. The 
Jackson Laboratory)”*. Ai6 mice were bred to express the green fluorescent protein 
ZsGreen inserted in the ROSA26 locus (B6.Gt(ROSA)26Sor!"'(CAG-2sGreen ze, 
The Jackson Laboratory). Similarly, Ail4 mice were bred to express the red 
fluorescent protein tdTomato inserted into the ROSA26 locus (Gt(ROSA) 
26Sor!4(CAG-tdTomato)Hze, The Jackson Laboratory). Cre-driver mice expressed 
Cre recombinase under the control of subtype-specific promoters, including 
Chat-cre (B6;129S6-Chat!”?")L!/]; The Jackson Laboratory) and Vglut2-cre 
(B6;Sle17a6t"2(re)Lo", Bradford Lowell). Dbx1—cre-ERT2 (Dbx1?E®™ (refs 15, 26)) 
dams were bred with Ai27 males and pregnancies were timed and monitored. We 
intraperitoneally injected tamoxifen (25 mg/kg; from 10 mg tamoxifen (Sigma- 
Aldrich) dissolved per ml of corn oil) on embryonic day (E)10.5. Mice were 
typically born after 20 days of gestation. No method of randomization was used to 
determine how animals were allocated to experimental groups and the investigators 
were not blinded when analysing data in this study. No statistical methods were 
used to predetermine sample size. 

In vitro slice preparations. We dissected the ventral respiratory column (VRC) 
using three types of brainstem slice: (1) A ‘caudal transverse slice that contains the 
preBotC as previously described"; (2) a rostral transverse slice that encompasses 
the B6tC and caudal portions of the VII nucleus; and (3) a horizontal slice that 
bilaterally isolates the VRC extending from the VII nucleus to the spinal cord. Slices 
were obtained at postnatal day (P)5-10 from CD1 and transgenic C57BL/6 mice. 
Animals were anaesthetized via rapid hypothermia on ice before quick decapita- 
tion at spinal cervical level C4-C5. The three slice types were differentiated by the 
cutting angle, plane and thickness of the slice. 

For transverse slices, the head was pinned in a tissue-culture dish filled with a 
silicone elastomer (Sylgard). Skin and connective tissue were removed, and fine 
scissors were used to cut along skull sutures to separate the interparietal region of 
the skull and expose the cerebellum. A one-sided razor was used to make a single 
cut between the inferior colliculus and cerebellum. The brainstem was isolated 
by removing the cerebellum in ice-cold, oxygenated (95% O2, 5% COs) artifi- 
cial cerebrospinal fluid (aCSF) containing (in mM): 128 NaCl, 3 KCI, 1.5 CaCh, 
1 MgCly, 24 NaHCOs, 0.5 NaH2PO,, and 30 p-glucose (pH 7.4, 305-312 mOSM). 
A slanted (~15° from vertical) agar block was secured on a specimen tray, and then 
the isolated brainstem and spinal cord preparation was glued with cyanoacrylate 
to the slanted portion of the agar such that the rostral end was facing upward and 
the dorsal side was glued to the agar. Serial transverse slices proceeded on a vibra- 
tome until visual landmarks became clear. Once the 4th ventricle was completely 
open, a 550-{1m slice was taken to obtain the rostral transverse slice containing the 
PiCo. The caudal face of this slice was characterized by containing the rostral-most 
portion of the nucleus ambiguus and the caudal portion of the VII nucleus. The 
subsequent 550-11m slice isolated the caudal transverse slice. This caudal slice was 
identical to the well-established transverse slice known to contain the preBotC!!. 
From an individual animal we routinely obtained both the rostral and the caudal 
slice preparation and recorded from the caudal side of each of the transverse slices 
in the same recording chamber. 

To obtain the third type of slice preparation, the horizontal slice, the brain- 

stem was mounted as described for the transverse slices. Serial coronal slices 
were taken from the rostral end of the brainstem until the facial nerves became 
visible, approximately 800-1,000 jum. The agar block was then removed from the 
specimen tray and reoriented so that the ventral surface of the brainstem faced 
upward and the blade advanced towards the rostral portion of the brainstem. 
The preparation was angled so that the ventral-most portion of the medulla was 
approximately level with the ventral-most portion of the spinal cord. The blade 
was positioned level to the rostroventral edge of the brainstem, stepped 900 jim 
downward (in the dorsal direction), and a single horizontal slice was cut retaining 
the ventral portion of the brainstem and spinal cord. The horizontal slice preserves 
long-range bilateral network interactions throughout the rostral-caudal axis of 
the VRC. 
In vitro electrophysiology. All slices were immediately transferred to the recording 
chamber, where they were superfused with aCSF at a rate of 10 ml/min, bubbled 
continuously in carbogen (95% O and 5% CO>) to oxygenate and adjust pH to 7.4, 
and allowed to equilibrate to experimental temperature (33 + 2 °C, thermoneu- 
tral zone for mice). Population activity was obtained by raising the extracellular 
potassium concentration from 3 mM to 8 mM in two steps over 30 min This is 
defined as ‘spontaneous conditions. 
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Population activity was routinely recorded with borosilicate glass microelec- 
trodes (World Precision Instruments) pulled on a Flaming/Brown micropipette 
puller (model P97, Sutter Instrument Co., <1 MQ tip resistance) placed on the slice 
surface. Signals were amplified, filtered, and integrated as previously described””. 
Automated burst analysis software was used to determine population recordings 
of burst frequency and amplitude”®. 

The mapping experiment (Fig. 1a) was performed by placing a reference extra- 
cellular electrode on the PiCo and a second, mapping extracellular electrode on the 
contralateral side of a horizontal slice. The mapping electrode was systematically 
moved in 100-|1m stereotaxic steps rostral, caudal, medial, and lateral to the PiCo. 
Postinspiratory burst amplitudes from the mapping electrode were normalized to 
that from the reference electrode to create a heat map of activity. 

In horizontal and rostral transverse slices from transgenic mice expressing 
ChR2 in a subset of neurons, the PiCo (contralateral to the recording electrode) 
was light-stimulated using optical fibre (DPSSL Driver, blue 473 nm wavelength, 
200\1m diameter, <22 mW/mm? intensity) for 500 ms or 1.5. Collections of 
10 or 40 sweeps were recorded in succession (shown overlaid). 

Intracellular blind patch recordings were performed on PiCo neurons. 
Borosilicate glass patch electrodes (with filaments, World Precision Instruments) 
were pulled (P-97 Flaming/Brown micropipette puller, Sutter Instrument Co.) to 
a 6-12 MQ resistance. Electrodes were filled with an intracellular patch solution 
containing (in mM): 140 K-gluconic acid, 1 CaCh, 10 EGTA, 2 MgCh, 4 Na,ATP, 
10 HEPES (pH 7.4). Whole cell patch-clamp recordings were obtained in current 
clamp configuration using a Multiclamp 700B amplifier (Molecular Devices) sam- 
pling at 20 kHz. Extracellularly recorded signal was sampled at 1.67 kHz, amplified 
10,000 times, filtered (low pass, 1.5 kHz; high pass, 250 Hz), rectified, and inte- 
grated using an electronic filter. Both extracellular and intracellular recordings 
were obtained with Clampex 10.0 (Molecular Devices). Recordings were stored 
on a computer for post-hoc analysis. 

Receptor antagonists and neuromodulators were bath perfused during in vitro 
extracellular and intracellular recordings. All stock solutions were stored at -20°C 
in small-volume aliquots to avoid repetitive freezing and thawing. Strychnine 
(14M, glycine receptor antagonist, Sigma Aldrich) and SR 95531 hydrobro- 
mide (gabazine, GABA, receptor antagonist, 10|1.M, Tocris) were used to block 
inhibitory synaptic transmission. To further block all fast synaptic transmission, 
3-(( + )2-carboxypiperazin-4yl)propyl-1-phosphate (CPP, NMDA receptor antag- 
onist, 10,.M, Tocris) and 6-cyano-7-nitroquinoxaline-2,3-dione (CNQX, AMPA 
receptor antagonist, 20\1M, Alimony Labs, diluted in DMSO) were bath applied. 
To block action potentials, 1 {1M tetrodotoxin (TTX, Sigma-Aldrich) was used. 
To block cholinergic receptors, 10|1M atropine, a muscarinic receptor antagonist 
(Sigma-Aldrich) and 1|1M mecamylamine hydrochloride, a nicotinic receptor 
antagonist (Sigma Aldrich) were bath applied. To stimulate the PiCo rhythm, 
DL-norepinephrine hydrochloride (norepinephrine, 1-4 1M, Sigma-Aldrich) was 
used; and to inhibit the PiCo rhythm, [D-Ala”, N-Me-Phe’, Gly°-ol]-Enkephalin 
(DAMGO, 1-300 nM, Sigma-Aldrich) and somatostatin (SST, 500 nM, Tocris) 
were applied. 

In vivo electrophysiology. Adult mice were prepared as described previously’. 
Chat-cre;Ai27 mice (P140-250) were anaesthetized with urethane (1.5 g/kg) and 
placed in a supine position, and the head was stabilized with ear bars. The trachea 
was exposed via a cervical midline incision and cannulated with a U-shaped tra- 
cheal tube. For the remainder of the surgery and experimental protocol, mice were 
allowed to spontaneously breathe humidified O2 (FiO2 = 100%). The rostral ends 
of the trachea and oesophagus were removed, followed by removal of the muscle 
and bone covering the ventral brainstem so that the vertebral and basilar arteries 
were visible. The dura and arachnoid membranes were removed followed by con- 
tinuous perfusion of the ventral medullary surface with 95% O2/5% CO) equili- 
brated aCSF solution at 37 + 0.5°C. The hypoglossal nerve (XII) and cervical vagus 
nerve (cVN) were isolated and cut, and their activity was measured using a suction 
electrode containing aCSF. Signals were amplified, bandpass filtered (8 Hz-3 kHz), 
and digitized with a Digidata 1400 and pClamp 10 software (Molecular Devices). 

After completion of the surgery, mice were allowed to stabilize for 15 min before 
15-20 min of baseline respiratory activity was recorded. Using the vertebral and 
basilar arteries as landmarks (Fig. 3a—c), 200|1m diameter optical fibres coupled to 
a 447 nm DPSSL driver laser at intensity <230 mW/mm’ were placed bilaterally on 
the ventral surface of the medulla above the region containing the PiCo. XII and 
cVN activity was recorded during 10-s episode files containing a 200-ms light pulse 
to stimulate Chat-cre-expressing cells. Inspiratory and postinspiratory (spontane- 
ous or light-evoked) activity was analysed using Clampfit 10 software (Molecular 
Devices). The phase of evoked cVN postinspiratory activity was determined as 
the fraction of the inspiratory cycle (Extended Data Fig. 9) or the fraction of the 
average duration of the preceding two inspiratory cycles (expected phase, Fig. 3e). 
The inspiratory phase duration during cycles containing a light-evoked PiCo burst 
were then divided by the expected phase to determine the phase delay (Fig. 3e). 
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In some mice, the PiCo was photostimulated before and after bilateral injection 
of 541M DAMGO to assess the effect of DAMGO on the inspiratory phase delay 
(Extended Data Fig. 10). 

To test whether PiCo activity is necessary for postinspiration in vivo, pulled 
micropipettes containing somatostatin (SST; 750|1M) or DAMGO (51M), and 
either Evan's Blue or Fast Green to identify the injection site, were inserted 
(300-400 1m) bilaterally into the PiCo. Spontaneous postinspiratory burst ampli- 
tude, duration, and frequency were then quantified 3-5 min after a 250-nL injection 
of SST or DAMGO and compared (using student's t-test) to pre-injection values 
(GraphPad, Prism 5 software). Postinspiratory burst duration was determined 
by subtracting the duration of XII nerve inspiratory activity from the duration 
of the corresponding cVN burst. Postinspiratory amplitude was defined as the 
amplitude of cVN activity immediately following XII nerve inspiratory activity. 
Following experimental protocols, mice were perfused with 4% paraformaldehyde 
(PFA), and brainstems were extracted and cryoprotected (30% sucrose in PBS). 
Brainstems were then serially sectioned to identify sites of injection. 
Immunohistochemistry. 200-|1m rostral transverse slices from Vglut2-cre;Ail4 
mice were fixed with 4% PFA for 1h and immunostained as whole mounts. Slices 
were washed in PBST (0.1-0.5% Triton X-100), blocked with 10% donkey serum 
in PBST overnight at 4°C, incubated for 2-3 days in primary antibody in blocking 
solution at 4°C, washed in PBST, incubated in secondary antibody in blocking solu- 
tion for 5-8h at room temperature, washed in PBST, counterstained with 0.01% 
DAPI (Life Technologies), and mounted in Fluoromount-G (SouthernBiotech). 
40-um sagittal sections from Vglut2-cre;Ail4 mice were also immunolabelled for 
quantification. Mice were transcardially perfused with 4% PFA, and brainstems 
were postfixed in 4% PFA overnight. Isolated brainstems were transferred through 
increasing sucrose gradients (10-30%), embedded in OCT compound (TissueTek), 
frozen, and cryosectioned. Immunohistochemical labelling followed the same pro- 
tocol as for whole mounts with shortened incubation times. Primary antibodies 
included anti-ChAT (1:100, AB144P, Millipore) and anti-Cre recombinase (1:200, 
908001, BioLegend). Secondary antibodies were Alexa Fluor 568- or 647-conjugated 
(1:250, Life Technologies). Maximum intensity projections of optical slice 
z-stacks were acquired using a Zeiss 710 Quasar 34-channel LSCM (Carl Zeiss). 
Cre-labelled images were despeckled for background noise reduction. 

Cell counting. Maximum intensity projections of 20x optical slice z-stacks were 
collected —40 to 360}1m medial to the medial end of the nucleus ambiguus. ChATT 
and Cret or tdTomatot cells were counted within this area, with the exception of 
ChAT* cells that clearly belonged to the nucleus ambiguus or VII nucleus (distin- 
guished by location and large cell size). Counts from each hemisphere were aver- 
aged for individual animals. Counts in the rostrocaudal direction were taken from 
sagittal slices 40-280 |1m medial to the medial end of nucleus ambiguus, where 
PiCo cells are most abundant. ChAT* and Cre* or tdTomato* cells were counted 
in 50-\1m bins through the rostrocaudal extent and summed across the 240-j1m 
span. For in vivo experiments, 50-j1m serial transverse brainstem sections were 
cryopreserved and processed as described above. Sections were imaged through 
the region encompassing the injection site and noted for the presence of Evan's blue 
or Fast Green dye. Chat-cre expression identified the rostral end of the nucleus 
ambiguus in order to quantify the rostrocaudal location of injection sites relative 


to the nucleus ambiguus. Anatomical diagrams and coordinates were adapted from 
the Franklin and Paxinos adult mouse brain atlas*”. 

In situ hybridization. P8-11 mice were perfused with 4% PFA (0.1 M sodium 
phosphate, pH 7.0), and brainstems were post-fixed in 4% PFA (0.1 M sodium 
phosphate, pH 7.0) + 4% sucrose, overnight at 4°C. Isolated brainstems were 
submerged in 30% sucrose, embedded in OCT, frozen at -80°C, and cryosectioned 
at 201m. Prior to hybridization, sections were fixed with 4% PFA/DEPC-PBS, 
pH 7.0 at 4°C for 5 min, treated with proteinase K (1 1g/ml) for 10 min at room tem- 
perature, fixed with 4% PFA/DEPC-PBS, pH 7.0 at 4°C for 5 min, and acetylated 
for 10 min at room temperature. DIG-labelled Vglut2-Dig antisense RNA probe 
(306 bp fragment of Vglut2 (1563-1869 bp, XM_006540602)) was hybridized onto 
sections (0.8 j1g/ml) at 42 °C overnight. Following hybridization, sections were 
incubated with RNase A (50j:g/ml, Invitrogen) for 30 min at 37°C. DIG Nucleic 
acid detection kit (Roche) was used for RNA probe detection. The sections were 
incubated in anti-digoxigenin-AP conjugate (Roche Applied Science, sheep, 
1:1,000) for 1h at room temperature. Hybridized molecules were visualized after 
incubation in an enzyme-catalysed colour reaction with a solution of 5-bromo- 
4-chloro-3-indolyl phosphate (BCIP) and nitroblue tetrazolium salt (NBT) (Roche 
Applied Science). The sections were developed in the BCIP/NBT solution for 2h in 
the dark at room temperature. The enzyme-catalysed colour reaction was stopped 
with TE, pH 8.0 and fixed in 4% PFA, pH 7.0 for 20 min at 4°C. 

Statistics. All statistics were performed using GraphPad Prism 5. Numerical data 
are reported as the mean + s.e.m. Normality was determined by D’Agostino- 
Pearson normality test. For normally distributed data, statistical significance was 
assessed by two-tailed paired Student's t-tests and two-way ANOVAs where appro- 
priate. Two-way ANOVAs were followed by Bonferroni post-hoc correction. For 
data that were not normal, we used non-parametric two-tailed Mann-Whitney, 
Kruskal-Wallis, one-way ANOVA, and repeated measures Friedman tests where 
appropriate. Kruskal-Wallis and Friedman tests were followed by Dunn's multiple 
comparison post-hoc tests. Variance was similar between groups that were statisti- 
cally compared. Results were considered significant when P < 0.05. a was set less 
than or equal to 0.05 for multiple comparison tests. Sample sizes were chosen on 
the basis of previous studies. 
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Extended Data Figure 1 | Schematic of the horizontal slice from a 
sagittal view that retains the medullary ventral respiratory column 

in the brainstem. Dotted lines represent approximate boundaries of the 
horizontal slice. Slice retains part of the superior olive (SO), and the entire 
retrotrapezoidal nucleus/para-facial respiratory group (RTN/pFRG), facial 
nucleus (VII N), Bétzinger complex (B6tC), postinspiratory complex 
(PiCo), nucleus ambiguus (NA), preBétzinger complex (preBétC), lateral 
reticular nucleus (LRT), and the rostral and caudal ventral respiratory 
groups (rVRG and cVRG, respectively). The slice also retains a portion 

of the spinal cord and includes part of the phrenic motor nucleus 
(approximately cervical segments 3 and 4). The slice does not contain 

the dorsal portion of the medulla including the dorsal respiratory group 
(DRG). Dorsal (D), ventral (V), rostral (R), caudal (C). 
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Extended Data Figure 2 | Norepinephrine dose response of preBétC 
and PiCo rhythms in horizontal and transverse slices. The frequency 

of the PiCo rhythm (black) is highly sensitive to the application of low 
concentrations of norepinephrine while the preBotC rhythm (purple) 
stays relatively constant in both types of slice preparations. a, In horizontal 
slices the PiCo rhythm is slow under spontaneous conditions (n= 10), 

the two rhythms have similar burst frequencies in the presence of 2}1M 
norepinephrine (1 = 6), and the PiCo rhythm significantly outpaces the 


preBotC rhythm under higher concentrations of norepinephrine (n = 4, 
3-4 1M norepinephrine). b, Similarly, when isolated in transverse slices, 
the PiCo rhythm has a slow frequency under spontaneous conditions 

(n = 10), and the preBotC and PiCo have similar frequencies at 2 1M 
norepinephrine (n = 7; mean + s.e.m.). Two-way ANOVA followed by a 
Bonferroni post-hoc test. ****P < 0.0001 comparing PiCo to preBotC, 


®P < 0.05 compared to baseline (Spon.), *P < 0.05 compared to 21M 
norepinephrine. 
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Extended Data Figure 3 | Nucleus ambiguus neurons lack Vglut2-cre expression. High magnification view at the level of PiCo from a Vglut2-cre;Ai6 
(ZsGreen1; green) mouse immunolabelled with ChAT antibody (magenta) and Cre antibody (white). Note lack of green Vglut2-cre expression in the 
nucleus ambiguus. Scale bar, 100 jm. 
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Extended Data Figure 4 | Progressive synaptic blockade in horizontal 
and transverse slices. Left, graphs comparing frequency and normalized 
burst area between horizontal (n = 5) and paired transverse slices (n = 5) 
after the application of strychnine and gabazine. In both horizontal 

and paired transverse slices, PiCo and preB6tC rhythms have nearly 
identical burst frequencies in the presence of gabazine (top). The burst 
area of both rhythms also significantly increases with the application of 
gabazine in both slice preparations (bottom). Two-way ANOVA followed 


5 sec 


by a Bonferroni post-hoc test. °P < 0.05 compared to baseline (2 1M 
norepinephrine). Right, synaptic blockers were progressively perfused 
over paired transverse slices at 10-min intervals. Both PiCo and preBotC 
rhythms persist in the presence of 11M strychnine, 101M gabazine, and 
101M CPP. Population rhythms ceased in the presence of 20 14M CNQX, 
indicating that both rhythms are excitatory (n = 5). The asterisk denotes a 
characteristic sigh in the preBOtC trace. 
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Extended Data Figure 5 | Peri-event interval between preBétC and PiCo 
bursts during inhibitory block in the horizontal slice. a, Peri-event 
interval, time between peak of preB6tC and PiCo bursts, is constant 

in strychnine; however, gabazine initiates progressive synchronization 
between rhythms (shown here in a representative experiment). 

b, Average peri-event intervals at baseline and after sequential application 
of strychnine and gabazine (n =6, mean + s.e.m.). Repeated measures 


Friedman test followed by Dunn’s multiple comparisons post-hoc test. 
*P < 0.05. 
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Extended Data Figure 6 | Blocking muscarinic and nicotinic 
acetylcholine receptors does not abolish the PiCo rhythm. a, Raw 
population bursts from PiCo and contralateral preB6tC with the 
progressive addition of 1 1.M mecamylamine (nicotinic receptor 
antagonist), 10|1M atropine (muscarinic receptor antagonist), and 41M 
norepinephrine. b, The left two graphs show n=5 experiments in which 
atropine was applied first, and the right graphs illustrate n = 3 experiments 
in which mecamylamine was applied first. Blockade of muscarinic 
receptors results in a larger decrease in PiCo burst frequency than 


Mecamylamine 


Ila 
hide 


Atropine 4 uM NE 
2 sec 


Mecamylamine first 


: preBotC 
f PiCo 


blocking nicotinic receptors, while preB6tC frequency does not change 
significantly (top graphs). Blockade of muscarinic receptors increases the 
amplitude of PiCo bursts (bottom graphs). The PiCo rhythm persists after 
concurrent blockade of both types of acetylcholine receptors, and PiCo 
burst frequency rebounds to near baseline levels when an additional 2 1M 
norepinephrine is applied (total 4\1M norepinephrine; top graphs; mean 
+ s.e.m.). Two-way ANOVA followed by a Bonferroni post-hoc test. 

**P < 0.01, *P < 0.05 comparing preB6tC to PiCo, °P < 0.05 compared to 
baseline (2 {1M norepinephrine). 
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Extended Data Figure 7 | Synaptically isolated PiCo neurons decrease 
firing frequency in the presence of DAMGO. a, Top traces show 
intracellular recordings from PiCo cells with concurrent extracellular 
preB6tC population activity from a horizontal slice under 1.M 
norepinephrine baseline conditions. Bottom traces show the same 
recordings after blocking fast synaptic transmission (1 |1M strychnine, 
104M gabazine, 10|1M CPP, 201M CNQX) to synaptically isolate PiCo 
neurons. Application of 1OnM DAMGO decreases the cell’s intrinsic firing 
frequency. b, Quantified data show that DAMGO significantly decreases 
action potential (AP) firing frequency of synaptically isolated PiCo 
neurons in both horizontal slices (black dots) and transverse PiCo slices 
(grey dots) (two-tailed paired t-test, *P < 0.05;n=5). 
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Extended Data Figure 8 | Differential PiCo and preBotC population 
responses to DAMGO and SST in horizontal and transverse slices. 
a, After the application of 25nM DAMGO, preBotC burst frequency 


decreases only slightly (1 = 5), whereas PiCo bursting is nearly eliminated. 


b, Similar to results observed in horizontal slices, the PiCo rhythm is 
eliminated by 25nM DAMGO in transverse slices that isolate PiCo and 
preBotC in the presence of 2|1M norepinephrine (n = 5). Periodic large 
amplitude bursts in the bottom preBotC trace are fictive sighs. c, DAMGO 
dose response of normalized preBétC and PiCo burst frequency in 
transverse slices, illustrating the differential sensitivity of the PiCo and 
preBétC rhythms to DAMGO; burst frequency values are normalized to 
baseline frequency in 241M norepinephrine (mean + s.e.m., n= 8 with 
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minimum replicates of 4 for each location and concentration). d, The PiCo 
rhythm is selectively and transiently inhibited by the application of 500 nM 
SST whereas the preBotC rhythm persists in horizontal slices. Graph 
shows normalized average burst frequencies of both rhythms at baseline, 
1.5-3.5 min after SST application, and 8-10 min after SST application 
(n=6). e, Similar to the horizontal slice, SST application results in a robust 
inhibition of PiCo bursting in paired transverse slices. f, Similar to 

d, complied normalized burst frequencies for n = 6 transverse slices before 
and after SST application. (mean + s.e.m.) Two-way ANOVA followed by 
a Bonferroni post-hoc test. ****P < 0.0001 comparing PiCo to preBétC, 
®P < 0.05 compared to baseline. 
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Extended Data Figure 9 | Light stimulation of cholinergic cells 

evokes postinspiratory activity in horizontal slices and in vivo. a, Two 
population electrodes were placed at the level of PiCo (black dot and 
trace) and contralateral preB6tC (purple dot and trace) in a horizontal 
slice from a Chat-cre;Ai27 mouse. Under spontaneous conditions (no 
norepinephrine), cholinergic neurons expressing channelrhodopsin-2 
were light activated with an optical fibre (labelled ‘light’) placed over the 
PiCo ipsilateral to the preB6tC electrode. PiCo population bursts were 
triggered upon the onset of a 1.5-s light pulse whereas no bursts were light 


evoked in the preBotC (n=6). Figure shows 10 traces overlaid for each 
electrode with averaged traces below from a representative experiment. 

b, Photo-stimulating PiCo in adult anaesthetized Chat-cre;Ai27 mice 
reliably triggers cVN bursts. Figure shows 10 traces overlaid with averages 
below of cVN and XII activity during a 200-ms light stimulation of PiCo. 
c, Postinspiratory bursts can be photo-evoked both in vivo (n=6) and 

in vitro (n =6) at any phase except for during inspiration and just before 
inspiration (bottom left) owing to the inspiratory phase delay that occurs 
when PiCo is stimulated (mean + s.e.m., bottom right). 
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Extended Data Figure 10 | Elimination of phase delay by DAMGO 

and diversity of postinspiratory waveforms in vivo. a, Injection of 

5M DAMGO into the PiCo eliminates the phase delay elicited by 
photostimulation of PiCo in Chat-cre;Ai27 mice. A representative 
experiment showing cVN and XII recordings during a 200-ms light pulse 
before and after injection of PiCo with DAMGO (left; grey bars, expected 
phase; purple bars, inspiratory phase delay) and the average inspiratory 


Kass Aa dood 


phase delay (right) (mean + s.e.m., two-tailed paired t-test, **P < 0.01; 
n= 6). b, Diverse postinspiratory vagal waveforms were recorded in vivo. 
Five examples of cVN (black) and XII (purple) recordings (overlaid) show 
that postinspiratory activity can vary from large decrementing patterns to 
small short bursts, potentially representing the neural basis of a variety of 
postinspiratory behaviours. 
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Synchronized cycles of bacterial lysis for in vivo 


delivery 


M. Omar Din!*, Tal Danino?+*, Arthur Prindle!, Matt Skalak?, Jangir Selimkhanov’, Kaitlin Allen?, Ellixis Julio!, Eta Atolia’, 


Lev S. Tsimring’, Sangeeta N. Bhatia 4>°78g & Jeff Hasty!*°§ 


The widespread view of bacteria as strictly pathogenic has given way 
to an appreciation of the prevalence of some beneficial microbes 
within the human body! >. It is perhaps inevitable that some bacteria 
would evolve to preferentially grow in environments that harbour 
disease and thus provide a natural platform for the development of 
engineered therapies* °. Such therapies could benefit from bacteria 
that are programmed to limit bacterial growth while continually 
producing and releasing cytotoxic agents in situ’-!°. Here we 
engineer a clinically relevant bacterium to lyse synchronously at 
a threshold population density and to release genetically encoded 
cargo. Following quorum lysis, a small number of surviving 
bacteria reseed the growing population, thus leading to pulsatile 
delivery cycles. We used microfluidic devices to characterize the 
engineered lysis strain and we demonstrate its potential as a drug 
delivery platform via co-culture with human cancer cells in vitro. As 
a proof of principle, we tracked the bacterial population dynamics 
in ectopic syngeneic colorectal tumours in mice via a luminescent 
reporter. The lysis strain exhibits pulsatile population dynamics 
in vivo, with mean bacterial luminescence that remained two orders 
of magnitude lower than an unmodified strain. Finally, guided by 
previous findings that certain bacteria can enhance the efficacy of 
standard therapies"', we orally administered the lysis strain alone 
or in combination with a clinical chemotherapeutic to a syngeneic 
mouse transplantation model of hepatic colorectal metastases. We 
found that the combination of both circuit-engineered bacteria 
and chemotherapy leads to a notable reduction of tumour activity 
along with a marked survival benefit over either therapy alone. 
Our approach establishes a methodology for leveraging the tools 
of synthetic biology to exploit the natural propensity for certain 
bacteria to colonize disease sites. 

In order to control population levels and facilitate drug delivery using 
bacteria, we engineered a synchronized lysis circuit (SLC) using cou- 
pled positive and negative feedback loops that have previously been 
used to generate robust oscillatory dynamics!*"*. The circuit (Fig. 1a) 
consists of acommon promoter that drives expression of both its own 
activator (positive feedback) and a lysis gene (negative feedback). 
Specifically, the luxI promoter regulates production of autoinducer 
(AHL), which binds LuxR and enables it to transcriptionally activate 
the promoter. Negative feedback arises from cell death that is triggered 
by a bacteriophage lysis gene (~X174 E) which is also under control of 
the luxI promoter'*"!°. AHL can diffuse to neighbouring cells and thus 
provides an intercellular synchronization mechanism. 

The bacterial population dynamics arising from the synchronized 
lysis circuit can be conceptualized as a slow build-up of the signalling 


molecule (AHL) to a threshold level, followed by a lysis event that 
rapidly prunes the population and enables the release of bacterial con- 
tents (Fig. 1b). After lysis, a small number of remaining bacteria begin 
to produce AHL anew, allowing the ‘integrate and fire’ process to be 
repeated in a cyclical fashion. We used microfluidic devices to observe 
growth and lysis with the fluorescent protein superfolder GFP (sfGFP) 
as a proxy for circuit dynamics in attenuated Salmonella enterica subsp. 
enterica serovar Typhimurium (Supplementary Videos 1 and 2). We 
observed periodic lysis events characterized by peaks in the fluorescent 
reporter expression that correspond to population lysis (Fig. 1c). The 
fraction of lysed cells remains consistent across subsequent cycles, sug- 
gesting that lysis and survival occur in a stochastic manner (Extended 
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Figure 1 | Construction and characterization of the SLC. a, The circuit 
contains an activator’? and lysis plasmid. When the population reaches 
the quorum threshold at a critical AHL concentration, the /uxI promoter 
drives the transcription of gene E for lysis, JuxI, and sfGFP or luxCDABE 
as the reporter module. The /uxI or the tac promoter also drives the 
transcription of the therapeutic gene for the stabilized circuit used in vivo. 
LuxR in this system is driven by the native JuxR promoter. b, The main 
stages of each lysis cycle from seeding to quorum ‘firing. Shown below 
the schematic depictions are typical time series images of the circuit- 
harbouring cells undergoing the three main stages of quorum firing in 

a microfluidic growth chamber”. c, Fluorescence profile of a typical 
microfludic experiment. The estimated cell population trajectory 

reveals that lysis events correspond to peaks of sfGFP fluorescence. 

d, Period as a function of estimated flow velocity in the media channel 

of the microfluidic device and environmental temperature. Error bars 
indicate + 1 s.d. for 13-50 peaks. These experiments were performed with 
strain 1, see Supplementary Information for complete strain information. 
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Figure 2 | Computational modelling and 
tunability. a, The model consists of intracellular 
variables (lysis protein E and LuxI concentrations) 
and extracellular variables (colony size and AHL 
concentrations). A time series of colony size 
(black), colony AHL (blue), intracellular Luxl 
(green) and lysis protein concentrations (red) are 
shown on the right. b, The region in the model 
parameter space for ClpXP-mediated degradation 
(see Supplementary Information) and flow where 
the model output is oscillatory increases with 
higher production and degradation terms. 

c, Results from the computational model 

showing the ability to tune the oscillatory period 
by varying ClpXP mediated degradation of Luxl. 
d, Fluorescence profiles showing lysis oscillations 
for LuxI ssrA (black, strain 2) and LuxI non-ssrA 
(blue, strain 1) tagged versions of the circuit. See 
Supplementary Information for complete model 
information. 
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Data Fig. 1a, b). Given the ultimate goal of implementation in an 
in vivo microenvironment characterized by variable growth conditions, 
we tested a range of incubation temperatures (36°C to 40°C) and per- 
fusion rates (100,1m s~! to 200 1m s~'), measuring an average period of 
3h across all conditions (Fig. 1d). These findings demonstrate that the 
SLC has the capacity to generate robust cycles of bacterial lysis in our 
microfluidic devices across a spectrum of environmental fluctuations 
that is likely to exist in an in vivo context. 

The emergence of bacterial therapies in synthetic biology has accen- 
tuated the need for predictive modelling. This need stems from a bot- 
tleneck created by a difference in the timescales for bacterial cloning 
versus animal experiments; the circuits required for candidate therapies 
can be created much faster than they can be tested in vivo. Therefore, 
in order to quantitatively characterize the SLC concept before testing 
in animal models, we developed a computational model (Fig. 2a and 
Supplementary Information) to define an optimal strategy for subse- 
quent testing in a lower-throughput animal model setting. We found 
that high production and degradation rates of the feedback-controlling 
proteins resulted in a wider domain of oscillatory dynamics in the 
parameter space (Fig. 2b). This model is consistent with our obser- 
vations that oscillations in S. Typhimurium were more robust than 
in Escherichia coli, in which rates of protein production and degrada- 
tion were previously found to be lower’® (Extended Data Fig. 1c and 
Supplementary Video 3). As the ability to manipulate circuit behaviour 
enhances the versatility of the system, we explored the tunability of 
the lysis period by adding an ssrA degradation tagging sequence on 
the LuxI protein. Consistent with model predictions, we observed an 
increased period and colony firing amplitude when tracking bacterial 
population dynamics (Fig. 2c, d and Extended Data Fig. 1d). The SLC 
thus enables tuning of the period and magnitude of delivery, which will 
be necessary for eventual application of this platform in the complex 
and fluctuating conditions present in vivo. 

To incorporate a cytotoxic payload into the SLC strain, we added 
expression of Haemolysin E, encoded by hlyE of E. coli, which has been 
tested as a pore-forming anti-tumour toxin!”. We initially confirmed 
the capability of the circuit to release intracellular contents by visualiz- 
ing released sfGFP with a small microfluidic sink located beneath the 
growth chamber (Extended Data Fig. 2a—c). Then to visualize bacte- 
rial lysis and killing of cancer cells in vitro via HlyE, we engineered a 
microfluidic device so that cancer cells adhere inside a growth channel 
that is flanked by smaller bacterial growth chambers, which permits 
simultaneous single-cell visualization of bacterial lysis and cancer cell 
death (Extended Data Fig. 2d). After co-culturing human cervical 
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cancer HeLa cells with S. Typhimurium harbouring the SLC circuit, 
we observed HeLa cell death upon the onset of bacterial lysis, indicating 
efficient toxin release (Fig. 3a, b and Supplementary Videos 4 and 5). 
Complete cell death occurred in the growth channel within ~111 min 
of initial sfGFP fluorescence (Fig. 3c). Thus, the SLC bacteria were 
capable of releasing HlyE at levels necessary to kill tumour-derived 
cells. 

We assessed the toxicity of released SLC or control bacterial contents 
in batch culture. As anticipated, we found that HeLa cells exposed to 
supernatant from a culture of the SLC bacteria bearing the hlyE module 
exhibited almost complete loss of viability (Fig. 3d), whereas the viabil- 
ity of HeLa cells exposed to supernatants of bacteria bearing the hlyE 
module without the SLC and equivalent dose of non-payload bearing 
SLC bacteria were only slightly affected (~15%). We concluded that 
bacterial lysis allowed for efficient HlyE release in vitro and that natu- 
ral intracellular bacterial contents do not significantly affect HeLa cell 
viability. We further investigated the delivery characteristics of the SLC 
bacteria with hlyE by seeding variable amounts of circuit-harbouring 
bacteria with HeLa cultures in well plates. We observed that the time to 
HeLa cell death following initial seeding increased with lower bacterial 
seeding volumes, presumably resulting from the extended time needed 
for bacteria to reach the quorum threshold (Fig. 3e and Supplementary 
Video 6). Initial seeding with a larger volume of bacteria resulted in 
increased firing rates which corresponded to shorter HlyE exposure 
times until cell death, consistent with a greater magnitude of lysis and 
payload release, although the cumulative toxicity threshold appears to 
be similar in all cases (Fig. 3f). On the basis of these observations, the 
seeding size of the bacterial population can be adjusted to determine 
the initial timing and release characteristics of the circuit. 

We used a luciferase reporter to monitor bacterial population 
dynamics in grafted syngeneic colorectal tumours in mice. To minimize 
the extent of plasmid loss in the absence of antibiotic selection in vivo, 
we incorporated previously described stabilizing elements for plasmid 
retention and segregation into the SLC strain'*”. Additionally, we 
placed both the payload and luxCDABE genes (the in vivo reporter 
module) under the /uxI promoter as an indicator of hlyE production 
and quorum firing via bacterial luminescence (Fig. 1a). Using a sub- 
cutaneous model of colorectal cancer (MC26 cell line) in immuno- 
competent mice, we intratumorally injected a strain of SLC bacteria 
(SLC-hly). We observed pulsatile bacterial population dynamics 
within the tumour (Fig. 4a-c and Extended Data Fig. 3a, b) using 
in vivo imaging technology”’, consistent with the design and in vitro 
characterization (Fig. 2). The end luminescence intensity was on 
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Figure 3 | In vitro co-culture. a, Schematic of 

the microfluidic co-culture with cancer cells and 
bacteria. Fluidic resistance was modified in this 
chip to achieve stable near-stagnant flow reduction 
to allow for cancer cell adherence and for diffusion 
of released therapeutic from the trap to the channel 
(see Supplementary Information). b, Frames from 
the co-culture time series sequentially visualizing 
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Figure 4 | In vivo bacterial dynamics, effect on tumours and tolerability 
in a subcutaneous tumour model. a, In vivo imaging over time of a 
mouse bearing two hind flank tumours injected once with the stabilized 
SLC-hly strain (strain 8). b, Single tumour density map trajectories of 
bacterial luminescence (relative to luminescence at 0 h) for the SLC-hly 
strain (strain 8). Data for each axis represents separate experiments. 

c, Single tumour density map trajectories of bacterial luminescence for 
the genomically integrated constitutively luminescent strain (strain 9). 
Intratumoral injection resulted in over 35-fold higher post-injection 
luminescence compared to intravenous injection (Extended Data Fig. 3d). 
d, Average relative tumour volume over time for subcutaneous tumour 
bearing mice injected with SLC-hly (red, strain 10), SLC-cdd (green, strain 
14), SLC-ccl21 (blue, strain 15), and all together (SLC-3) (black). Bacteria 
were injected intratumorally on days 0, 2, 6, 8, and 10 (black arrows) 
(#***P < 0.0001, two-way ANOVA with Bonferroni post-test, n = 14-17 
tumours, error bars show s.e.). e, Average relative tumour volume over 


time for mice with subcutaneous tumours injected with the SLC-3 strains 
(black, strains 10, 14 and 15) and the no-plasmid control (magenta, strain 7). 
Bacteria were injected intratumorally on days 0, 2, 6, and 10 (black arrows) 
(#***P < 0.0001, two-way ANOVA with Bonferroni post-test, n = 18-19 
tumours, error bars represent s.e.). f, Average relative body weight over 
time for mice with subcutaneous tumours injected with the SLC-3 strains 
(black, strain 10, 14, and 15) and the no-plasmid control (magenta, 

strain 7). Bacteria were injected intratumorally on days 0, 2, 6, and 10 
(black arrows) (n= 10 mice for both cases, error bars represent s.e.). 

g, Average relative body weight over time for subcutaneous tumour- 
bearing mice with a single intravenous injection of the SLC + constitutive 
hlyE (turquoise, n =9 mice, strain 11), a non-SLC strain with constitutive 
hlyE (orange, n =5 mice, strain 12), or the no-plasmid control strain 
(magenta, n = 9 mice, strain 7) (***P < 0.001, two-way ANOVA with 
Bonferroni post-test, error bars represent s.e.). 
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Figure 5 | In vivo testing in an experimental model of colorectal 
metastases in the liver via oral delivery of bacteria. a, Schematic of 

the experimental syngeneic transplantation model of hepatic colorectal 
metastases in a mouse, with the dosing schedule of either engineered 
bacteria (SLC-3) or a common cytotoxic chemotherapeutic, the 
antimetabolite 5-FU. The SLC-3 strains were delivered orally. 5-FU was 
delivered via intraperitoneal injection. b, Relative body weight over time 
for the mice with hepatic colorectal metastases fed with the SLC-3 strains 
(blue), injected with 5-FU chemotherapy (red), or a combination of the 


average ~300-fold lower than the constitutive control strain, indicating 
a significant decrease in bacterial population levels within the tumour 
(Extended Data Fig. 3c). 

Given the ability to engineer bacterial population dynamics in 
tumour grafts, we leveraged the versatility of the SLC bacteria as a 
delivery system to compare different classes of previously developed 
payloads. In addition to the haemolysin strain that was characterized in 
microfluidic devices, we created two additional SLC strains expressing 
genes to activate a host immune response (via T-cell and dendritic cell 
recruitment, using mouse CCL21) or trigger tumour cell apoptosis 
(using the cell death domain of Bit1 fused to the tumour-penetrating 
peptide iRGD, or CDD-iRGD)**”*. Upon intratumoral injection, the 
immune recruitment strain elicited the strongest effect on tumour 
growth when compared to the haemolysis or apoptotic strains (Fig. 4d). 
We observed that an equal mixture of the three strains generated a 
stronger response than any single strain (Fig. 4d and Extended Data 
Fig. 3e-g), and on this basis we elected to pursue the ‘triple-strain’ dose 
for further testing in order to minimize animal usage. In a side-by-side 
comparison, we observed that the tumour response to SLC triple-strain 
(SLC-3) injections was significantly larger than the response to unmod- 
ified bacteria (Fig. 4e). Upon necropsy, histopathological analysis of 
remnant tumours was performed for mice treated with the SLC-3 
strains, chemotherapy or unmodified bacteria. In mice treated with 
SLC-3 and non-circuit bacterial strains, robust staining of bacteria 
was observed by anti-Salmonella antibodies, showing localization of 
Salmonella within tumours. TUNEL staining indicated higher levels 
of apoptosis and cell death in SLC-3 treated tumours (Extended Data 
Fig. 4). 

As a first step towards monitoring the effect of bacterial injections 
on the host, we compared how the triple-strain system affected body 
weight when administered intratumorally and intravenously, as the 
administration route affects bacterial localization (Extended Data 
Fig. 3d). We found that treatment with the SLC strains generated 
the same weight change as unmodified bacteria when administered 
intratumorally (Fig. 4f). However, intravenous administration of the 
SLC conferred a greater health benefit on the basis of observations 
that SLC strains producing constitutive therapy were better tolerated 
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two (green). Error bars indicate + 1 s.e. for 5-7 mice. c, Median relative 
tumour activity, measured via tumour cell luminescence using in vivo 
imaging, for the chemotherapy and SLC-3 cases from b. d, Median relative 
tumour activity for the combination therapy case from b. Error bars for 
cand d indicate the interquartile ranges for 5-7 mice. The dashed line 
marks relative tumour activity of 0.70. e, Fraction of mice from the 

cases in b which respond with 30% reduction of tumour activity over 
time. f, Fraction survival over time for the mice in b (**P < 0.01, log rank 
test; 1m = 5-7 mice). 


than unmodified bacteria or non-SLC strains producing constitutive 
therapy (Fig. 4g). Although further targeted studies are required to 
systematically explore the effect of these bacteria on host health, these 
preliminary experiments suggest that the SLC design can reduce the 
burden of bacterial injections. 

To explore a proof-of-principle for the application of our circuit in 
the context of in vivo tumours, we examined the efficacy of our sys- 
tem in an experimental syngeneic transplantation model of colorectal 
metastases within the liver. We had previously established that oral 
delivery of these bacterial strains led to safe and efficient colonization 
of hepatic colorectal metastases (see Methods), and that mice toler- 
ated repeated dosing without overt adverse effects (Fig. 5a, b)??. In 
the context of bacteria-based therapeutic candidates, previous studies 
have shown that anaerobic bacteria can occupy avascular tumour com- 
partments where chemotherapy is thought to be ineffective due to poor 
drug delivery'!. Thus a synergistic effect may arise when bacteria are 
used to deliver drugs to the necrotic core of a tumour, while standard 
chemotherapy is used for the vascularized regions'!®. Inspired by this 
paradigm, we tested the combination of SLC-3 bacteria with a com- 
mon clinical chemotherapy of 5-fluorouracil (5-FU). Tumours exhib- 
ited similar growth trajectories in response to repeated oral delivery 
of either the bacterial therapy alone, or two iv. doses of 5-FU on day 0 
and day 21 (Fig. 5c). In contrast, combination of these two applica- 
tions led to a marked decrease in tumour activity over a period of 
18 days, followed by a return to growth (Fig. 5d). During the initial 
18-day period, a large fraction of the tumours was scored as eliciting at 
least a 30% reduction in tumour activity (Fig. 5e). The overall response 
led to roughly a 50% increase in the mean survival time for animals 
harbouring incurable colorectal metastases (Fig. 5f). Improvements 
may arise from strategies for long-term circuit stability or the utilization 
of additional therapeutic cargo. 

The synchronized lysis circuit exemplifies a methodology for 
leveraging the tools of synthetic biology to exploit the ability of certain 
bacteria to colonize disease sites. In contrast to most drug delivery strat- 
egies, the synchronized lysis paradigm does not require pre-loading 
of a drug or the engineering of additional secretion machinery. In 
addition, it has the potential to decrease the likelihood of a systemic 
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inflammatory response through population control; as the bacterial 
colony is pruned after each oscillatory lysis event, the design could 
mitigate an undesirable host response. The circuit may enable new 
bacterial drug delivery strategies through modulation of the frequency 
and amplitude of the population cycles over time. Given recent insights 
into how host metabolism and circadian function are affected by the 
population dynamics of the gut micobiota, cyclical population con- 
trol may be a prospective strategy to prevent host disturbances result- 
ing from aberrant oscillations of gut microbes”””*. Such engineering 
strategies may allow for the development of therapeutic communities 
within in vivo environments in which population dynamics are driven 
by interacting viruses, bacteria and host immune cells”’. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Strains and plasmids. Our circuit strains were cultured in LB media with 
504g ml! and 34g ml“! of kanamycin and chloramphenicol respectively, 
along with 0.2% glucose, in a 37°C incubator. Mammalian cells (HeLa CCL-2 
from ATCC, verified by third-party cell line authentication services using an STR 
multiplex system) were cultured in DMEM media supplemented with 10% fetal 
bovine serum and 1% penicillin/streptomycin (CellGro 30-002-CI), placed inside a 
tissue culture incubator at 37 °C maintained at 5% CO3. Plasmids were constructed 
using the CPEC method of cloning or using standard restriction digest/ligation 
cloning. The activator plasmid (Kan, ColE1) was used in previous work from our 
group, and the lysis plasmid was constructed by taking the lysis gene, E, from the 
ePop plasmid via PCR and cloning it into a vector (Chlor, p15A) under the control 
of the /uxI promoter!*'. The hlyE gene was obtained via PCR from the genomic 
DNA of MG1655, while mouse CCL21 and CDD-iRGD were synthesized. These 
genes were cloned into the lysis plasmid, under the control of either the tac or luxI 
promoters (Extended Data Fig. 5). Co-culturing was performed with HeLa cells 
and either motile or non-motile S. Typhimurium, SL1344 (Extended Data Table 1). 
For full strain and plasmid information, please refer to the Supplementary 
Information. 

Microfluidics and microscopy. The microfluidic devices and experiment prepara- 
tion protocols used in this study are similar to those previously reported from our 
group’*. The bacteria growth chambers were 100 x 100|1m in area and approxi- 
mately 1.4\1m in height. For co-culture experiments on the chip, we first loaded a 
suspended culture of HeLa cells in the device media channels at very low flow rates, 
to allow for adherence, and then incubated the device in a tissue culture incubator 
for 0.5-2 days to allow for proliferation. On the day of the experiment, the device 
was transferred to the microscope and circuit-containing bacteria were loaded in 
the growth chambers before imaging. Acquisition of images was performed with 
a Nikon TI2 using a Photometrics CoolSnap cooled CCD camera. The scope and 
accessories were programmed using the Nikon Elements software. 

Co-cultures for well plate experiments were performed in Falcon 96-well tissue 
culture plates. HeLa cells were allowed to adhere to the wells before the addition 
of bacteria and subsequent imaging under the microscope or measurement in a 
TECAN Infinite M200 Pro plate reader. For viability measurements using an MTT 
assay, there were two technical replicates per well. For fluorescence measurements 
of co-cultures with variable seeding densities of bacteria, there were three technical 
replicates per case. 

Additional details on microfluidics and microscopy can be found in the 
Supplementary Information. 

In vivo experiments. All animal work was approved by the committee on animal 
care (MIT, protocol 0414-022-17). The protocol requires animals to be euthanized 
when tumors reach 2 cm’, or under veterinary staff recommendation. The cell line 
(MC26-LucF, Tanabe laboratory, Massachusetts General Hospital) was obtained 
from, and authenticated by, the Tanabe laboratory, MGH. The cell line was tested 
several times to be mycoplasma-free before implantation in mice. Sample sizes for 
mice were determined by expected effect size to produce a power of 0.8-0.9. Mice 
were blindly randomized into various groups using a random number generator. 


Subcutaneous tumour model. Animal experiments were performed on 6-week-old 
female BALB/c mice (Taconic Biosciences) with bilateral subcutaneous hind flank 
tumours from an implanted mouse colon cancer cell line. The concentration for 
implantation of the tumour cells was 10° cells per ml in DMEM (no phenol red). 
Cells were then implanted subcutaneously at a volume of 1001] per flank, with 
each implant consisting of 107 cells. Tumours were typically grown to an average 
of 300 mm? before experiments. 

Experimental liver metastasis model. The experimental metastasis model was 
generated by injecting luciferase-producing mouse cancer cells into surgically 
externalized spleens of immunocompetent mice. Tumour cells seeded the liver dur- 
ing 90s, after which the spleen was removed to prevent ectopic tumour growth*?. 
The MC26-LucF cell line was used (Tanabe Laboratory, MGH) and injected at 
5 x 10* cells per 10011 PBS into the spleens of female BALB/c mice at 6 weeks of 
age (Taconic Biosciences.). For the liver metastasis model, tumours were grown 
for 5-7 days to an average total tumour burden of 143 mm? before experiments. 
Bacterial growth and administration. Bacterial strains were grown overnight in 
LB media containing appropriate antibiotics and 0.2% glucose as for the in vitro 
experiments. A 1:100x dilution in fresh media with antibiotics was started the day 
of injection and grown until an OD <0.1 to prevent bacteria from reaching the 
quorum threshold (for SLC specifically). Bacteria were spun down and washed 2 
to 3 times with sterile PBS before injection into mice. Intratumoural injections of 
bacteria were performed at a concentration of 5 x 107 cells per ml in PBS with a 
total volume of 10-20 11 injected per tumour, while intravenous injections were 
given at a total volume of 100,11. For the SLC-3 strains injection, this final volume 
was equally divided between the three strains at the indicated density. For liver 
metastasis experiments, bacteria were grown in LB media containing appropriate 
antibiotics and 0.2% glucose until they reached an OD of 0.05, after which they 
were concentrated to 10° to 5 x 10° bacteria per ml and delivered via oral gavage. 
Post-administration monitoring for subcutaneous liver metastasis models. 
Luminescent signal was measured with the IVIS spectrum in vivo imaging system 
following bacterial injection. Measurements were compared relative to pre-injection 
values to follow dynamics. Subcutaneous tumour volume was quantified using cali- 
pers to measure the length, width, and height of each tumour throughout the imaging 
course (V=L x W x H). Volumes were compared to pre-injection values to follow 
physical tumour growth. Survival of mice was measured as the time from the begin- 
ning of the experiment up to the day when mice were moribund and euthanized. 
Survival for the experiment in Fig. 4f was measured with two biological replicates. 
Statistical analysis. Statistical tests were calculated either in Excel (Student's 
t-test) or GraphPad Prism 5.0 (ANOVA with Bonferroni post-test, log-rank test). 
The details of the statistical tests carried out are indicated in the respective figure 
legends. Where data were approximately normally distributed, values were com- 
pared using either a Student’s t-test or one-way ANOVA for single variable, or a 
two-way ANOVA for two variables. Mice were randomized in different groups 
before experiments. 


30. Soares, K. C. et al. A preclinical murine model of hepatic metastases. JoVE 
e51677 (2014). 
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Extended Data Figure 1 | Various properties of the SLC. a, The fraction 
and number of bacterial cells cleared per consecutive oscillatory cycle 

in the growth chamber for a typical microfluidic experiment for 

S. typhimurium, including the effects of lysis and flow of cells outside of 
the trap (strain 1). b, Subset of time series images from the experiment in 
a showing a portion of the growth chamber where survivors of the initial 
lysis event (160 min frame, red outline) produce progeny (250 min frame, 
magenta outline) which are lysis sensitive. c, Period as a function of the 
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environmental temperature for E. coli (strain 13). The circuit does not 
oscillate for temperatures above 37 °C in E. coli. Error bars indicate + 1 s.d. 
for 12-19 peaks. d, Colony amplitude at quorum firing for increasing 
degradation on the Luxl activator protein in the computational model. 
These simulation results are supported by batch well-plate experiments 

of the Lux ssrA (black, strain 2) and non-ssrA (blue, strain 1) tagged 
versions of the circuit in S. Typhimurium (inset). 
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sfGFP visualization after release. b, Number of bacteria (red), bacterial co-culture experiments in a microfluidic device (also see Supplementary 
fluorescence (blue), sink fluorescence (pink) for a typical oscillatory Information). 
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Extended Data Figure 3 | In vivo expression and therapy testing. 

a, End-point in vitro luminescence intensity for SLC strains after ~20h 

of growth. Host strains A and B are the host bacteria for strains 8 and 10. 
They are ELH1301 and ELH 430, respectively. Host A exhibits around 
twofold higher luminescence with the same circuit than host B. b, In vivo 
imaging over time of a mouse bearing subcutaneous tumours injected with 
a genomically integrated constitutively luminescent strain (strain 9). 

c, End-point in vivo bacterial luminescence of the SLC-hly strain and the 
constitutively luminescent strain from the experiments presented in Fig. 4. 
Error bars represent the s.e.m. bacterial luminescence from 9 tumours. 

d, Post-injection in vivo bacterial luminescence for the constitutively 
luminescent strain administered intravenously (vein) or intratumorally 
(tumour). Luminescence was measured ~20h post-injection. Error bars 


Time (days) Time (days) 


represent s.e.m. bacterial luminescence from 6 and 9 tumours for the 
intravenous and intratumoural cases, respectively. e, Average relative 
tumour volume over time for subcutaneous tumour bearing mice injected 
with the no-plasmid bacterium (strain 7), 5-FU chemotherapy, the SLC-3 
strains, and the combination of SLC-3 with chemotherapy. Bacteria 

were injected intratumorally on days 0, 4, and 7 (black arrows), and 
chemotherapy was administered on days 2 and 9 (red arrows) (*P < 0.05, 
** PD < 0.0001, two-way ANOVA with Bonferroni post-test, n = 12-16 
tumours, error bars represent s.e.). f, Fraction of mice from the cases 

in e which respond with 30% reduction of tumour volume over time. 

g, Fraction survival over time for mice with hepatic colorectal metastases 
fed with either the SLC-3 strains (blue) or the no-plasmid control (black) 
(*P < 0.05, log rank test; m = 11-12 mice). 
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Extended Data Figure 4 | Histological analysis of tumour sections. 

a, Histology of tumour sections taken from mice with different treatments 
3 days post-administration. Haematoxylin and eosin staining for tissue 
sections intravenously injected with a combination of therapeutic bacteria 
(SLC-3), chemotherapy (5-FU), or a bacteria control with no therapeutic 
(strain 7) (i); TUNEL staining (red) in the same sections indicating cell 
apoptosis (ii); Salmonella immunohistochemistry (red) in the same 
sections confirming presence of bacteria in tumours (iii). Scale bars for 
(i-iii) denote 501m. TUNEL (iv) and Salmonella (v) staining (red) in 
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the entire tumour sections (examples indicated by arrows). Scale bars 

for iv and v denote 100 1m. DAPI staining (blue) was used to obtain a 
measure of live and dead cells in ii-iv. Histology slices (n =6) from 20x 
images were compared across the groups and mean intensity of TUNEL 
staining, normalized by sample area, was demonstrated to be significantly 
higher for SLC-3 compared to the other two groups (P < 0.0001, one-way 
ANOVA), and not significantly different between the chemotherapy and 
bacteria-only cases. 
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Extended Data Figure 5 | The main plasmids used in this study. See Supplementary Information for more details. 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


Extended Data Table 1 | A list of strains and respective plasmids used in this study 


Strain # Strain Name 

1 MOD47 

2 MOD46a 

3 MOD67 

4 MOD61 

5 MOD64 

6 MOD65 

vA ELH1301 

8 MOD105 

9 EcN-luxCDABE 
10 MOD101 
11 MOD102 
12 MOD69 
13 MOD29 
14 MOD110 
15 MOD112 


See Supplementary Information for more details. 


Host Bacterium 


SL1344, M913 


SL1344, M913 


$L1344, M913 


$L1344, ELH1301 


SL1344, ELH1301 


$L1344, ELH1301 


$L1344, ELH1301 


$L1344, ELH430 


Nissle 1917 


$L1344, ELH1301 


$L1344, ELH1301 


$L1344, ELH1301 


JS006, BW25113 


$L1344, ELH1301 


$L1344, ELH1301 


Plasmid(s) 
pTD103 Iuxl (-LAA) sfGFP + pZA35 X714E (+LuxR) 
pTD103 lux! sfGFP + pZA35 X714E (+LuxR) 
pTD103 lux! (-LAA) sfGFP + pZA35 X714E (+LuxR) ptac::HlyE 
pTD103 luxl sfGFP + pZA35 X714E (+LuxR) ptac::HlyE 
pTD103 lux! sfGFP + pZA35 X714E (+LuxR) 
pZA35 ptac::HlyE 
N/A 
pZE25 luxl luxCDABE hok/alp + pZA35 X714E (+LuxR) pLux::HlyE hok/alp 
N/A 
pZE25 luxl luxCDABE hok/alp + pZA35 X714E (+LuxR) pLux::HlyE hok/alp 
pZE25 lux! luxCDABE hok/alp + pZA35 X714E (+LuxR) ptac::HlyE hok/alp 
pTD103 LuxCDABE hok/alp + pZA35 X714E (+LuxR) ptac::HlyE hok/alp 
pTD103 luxl sfGFP + pZA35 X714E (+LuxR) 
pZE25 luxl luxCDABE hok/alp + pZA35 X714E (+LuxR) pLux::CDD-iIRGD hok/alp 


pZE25 luxl luxCDABE hok/alp + pZA35 X714E (+LuxR) ptac::mCCL21 hok/alp 
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CD47-blocking antibodies restore phagocytosis and 


prevent atherosclerosis 


Yoko Kojimal, Jens-Peter Volkmer?, Kelly McKenna’, Mete Civelek*, Aldons Jake Lusis?, Clint L. Miller+, Daniel Direnzo!, 
Vivek Nanda!, Jiangin Ye!, Andrew J. Connolly”, Eric E. Schadt®, Thomas Quertermous’, Paola Betancur?, Lars Maegdefessel’, 
Ljubica Perisic Matic®, Ulf Hedin®, Irving L. Weissman? & Nicholas J. Leeper! 


Atherosclerosis is the disease process that underlies heart attack 
and stroke!. Advanced lesions at risk of rupture are characterized 
by the pathological accumulation of diseased vascular cells and 
apoptotic cellular debris”. Why these cells are not cleared remains 
unknown*. Here we show that atherogenesis is associated with 
upregulation of CD47, a key anti-phagocytic molecule that is known 
to render malignant cells resistant to programmed cell removal, or 
‘efferocytosis’*”. We find that administration of CD47-blocking 
antibodies reverses this defect in efferocytosis, normalizes the 
clearance of diseased vascular tissue, and ameliorates atherosclerosis 
in multiple mouse models. Mechanistic studies implicate the 
pro-atherosclerotic factor TNF-a as a fundamental driver of 
impaired programmed cell removal, explaining why this process is 
compromised in vascular disease. Similar to recent observations in 
cancer”, impaired efferocytosis appears to play a pathogenic role in 
cardiovascular disease, but is not a fixed defect and may represent 
a novel therapeutic target. 

Each day the human body turns over more than 100 billion cells®. 
To prevent the inflammatory consequences associated with the 
accumulation of apoptotic debris’, these cells are rapidly and efficiently 
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cleared through a phagocytic process known as programmed cell 
removal, or ‘efferocytosis’'°. Programmed cell removal is mediated 
by macrophages detecting phagocytic ‘eat me’ signals on the target 
cell surface, and can be countermanded by cell-surface expression 
of anti-phagocytic ‘don't eat me’ signals such as expression of CD47 
(ref. 6). Whereas programmed cell removal is highly conserved across 
almost all physiological conditions and in all tissues, it appears to be 
significantly impaired in atherosclerotic cardiovascular disease’, the 
leading cause of death worldwide". Atherosclerosis is characterized by 
the accumulation of diseased macrophages and vascular smooth muscle 
cells (SMCs), which not only encroach on the lumen of the associated 
vessel but may also undergo programmed cell death’!?. The impaired 
clearance of these diseased cells by lesional macrophages is thought to 
explain why these cells are frequently observed in the atherosclerotic 
necrotic core, and may potentiate vascular inflammation and risk for 
eventual plaque rupture*'*!*, However, the mechanism underlying this 
defect has not yet been identified. 

We recently found that the key anti-phagocytic molecule, CD47, 
is paradoxically upregulated by a variety of cancers®”!°. This renders 
malignant cells resistant to classic immune surveillance machinery 


Figure 1 | CD47 is upregulated in 
» atherosclerosis. a, Microarray expression 
profiling in two carotid endarterectomy cohorts 
reveals that CD47 expression is significantly 
increased in human atherosclerotic plaque, 
relative to non-diseased vascular tissue (data 
displayed as Tukey box plots, n = 182 subjects). 
Soi + 10x b, Immunostaining identifies intense CD47 
pose Maematory(a upregulation within the necrotic core of human 
atherosclerotic coronary artery lesions (left) 
and carotid plaques (right). c, TaqMan mRNA 
analysis confirms that vascular CD47 expression 
progressively increases in a mouse model of 
atherosclerosis (apoE~’~ mice fed high-fat diet, 
grey), relative to control animals (C57BL/6 mice 
fed chow, white, = 4 mice per time point). 
d, Immunohistochemistry staining with a biotin- 
labelled antibody (brown) reveals that CD47 
expression co-localizes with apoptotic tissue in 
murine atherosclerotic plaque. ***P < 0.001, 
** P for trend <0.03. Error bars represent the 
standard error of the mean (s.e.m.). 
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Figure 2 | Inhibition of CD47 stimulates efferocytosis and prevents 
atherosclerosis. a, Compared to mice treated with control antibodies 
(IgG, n= 16), mice treated with inhibitory anti-CD47 antibodies (n = 15) 
develop significantly smaller atherosclerotic plaques, as measured by Oil 
Red O (ORO) content in the aortic sinus. b, Total aortic atherosclerosis 
content is also reduced. ¢, d, Inhibition of CD47 signalling does not 

alter the rate of programmed cell death in vitro (c), but does reduce the 
accumulation of apoptotic bodies in vivo (d). e, Anti-CD47 antibody 
promotes efferocytosis of vascular cells at baseline and after exposure 

to pro-atherosclerotic lipids. Representative FACS phagocytosis plots 

for lipid-loaded (72 h) SMCs shown on the right (all assays repeated in 
triplicate). f, In vivo anti-CD47 antibody reduces the number of ‘free’ 
apoptotic bodies not associated with phagocytic macrophages, potentially 


such as the tumoricidal macrophage, and is now recognized as a 
fundamental driver of tumour growth. To determine if dysregulated 
CD47 may also contribute to atherogenesis, we evaluated its expression 
in two independent human vascular tissue biobanks!®*!”. We found 
that CD47 is consistently upregulated in human atherosclerotic plaque 
compared to non-atherosclerotic vascular tissue (Fig. 1a), and in sub- 
jects with symptomatic cerebrovascular disease (stroke or transient 
ischaemic attack) compared to those with stable asymptomatic lesions 
(Extended Data Fig. 1a). Because some efferocytosis molecules are 
known to undergo post-translational modification'’, we also performed 
immunofluorescence and immunohistochemical staining of human 
coronary and carotid arteries, which confirmed that CD47 is progres- 
sively upregulated during atherogenesis and appears to localize intensely 
to the necrotic core (Fig. 1b, Extended Data Fig. 1b-g). Similar findings 
were observed in mouse models of atherosclerosis and other publically 
available microarray data sets (Fig. 1c, d, Extended Data Fig. 2). 
Together, these data suggest that pathologic upregulation of ‘don't eat 
me’ molecules may explain why phagocytosis is impaired within the 
human atherosclerotic plaque, which may promote lesion expansion 
over time. 


Mac-3 


IgG = Anti-CD47 


indicative of increased efferocytosis (stars indicate ‘free’ apoptotic bodies, 
arrows indicate ‘not-free’ apoptotic bodies). g, Electron microscopy 
confirms that mice treated with anti-CD47 antibodies display features of 
enhanced intraplaque efferocytosis, including an increased prevalence of 
macrophages which had ingested multiple apoptotic bodies (white arrows) 
and a reduced burden of ‘free’ apoptotic bodies (yellow arrows). h, Mice 
treated with anti-CD47 antibodies develop smaller necrotic cores than 
mice treated with IgG. i, Anti-CD47 antibody inhibits phosphorylation 
of lesional SHP1, a key anti-phagocytic effector molecule known to signal 
downstream of CD47. STS, staurosporine. Comparisons made by two- 
tailed t-tests. ***P < 0.001, **P< 0.01, *P< 0.05. Error bars represent 
s.e.m. 


To determine if this defect could be exploited as a translational 
target for cardiovascular disease, we treated a cohort of atheroprone 
animals (apolipoprotein-E-deficient (apoE~/~) mice implanted 
with angiotensin-II-infusing minipumps’’) with an inhibitory anti- 
body directed against CD47 (Extended Data Fig. 3a)'°. Compared 
to IgG control, anti-CD47 antibody treatment was associated with 
a marked reduction in atherosclerosis, both in the aortic sinus and 
en face in the aorta (Fig. 2a, b, Extended Data Fig. 3b, c). Similar 
results were observed in several additional models, including models 
of chronic atherosclerosis, plaque vulnerability and in mice with 
established lesions, as would be encountered clinically (Extended Data 
Fig. 3d-h). Although anti-CD47 antibodies had no effect on apoptosis 
in vitro (Fig. 2c, Extended Data Fig. 4a, b), we observed signifi- 
cantly fewer apoptotic bodies in the lesions of anti-CD47-treated 
animals in vivo (Fig. 2d). To reconcile this discrepancy, we used an 
established in vitro phagocytosis assay, and found that anti-CD47 
antibodies potently induced the clearance of diseased and apoptotic 
vascular SMCs and macrophages that had been exposed to oxidized 
phospholipids to simulate the atherosclerotic environment (Fig. 2e, 
Extended Data Fig. 4c-f). Similarly, the number of ‘free’ apoptotic 
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Figure 3 | The pro-atherosclerotic cytokine TNF-a induces CD47 
expression and renders vascular cells resistant to phagocytic clearance. 
a, Ingenuity Pathway Analysis identifies TNF-a as the regulator most 
likely to be upstream of genes that are co-expressed with CD47 in vascular 
tissue ex vivo. b, c, Co-expression studies confirm that CD47 is positively 
correlated with the canonical TNF-a receptor, TNFR1, in human coronary 
plaque (b) and TNF-a levels in human carotid plaque (c). The Pearson 
correlation coefficient was determined assuming a Gaussian distribution 
and P values were determined using a two-tailed test shown with the 

95% confidence band of the best fit line. d, In vitro, TNF-« treatment 


bodies not associated with an intraplaque macrophage (indicative of 
failed efferocytosis) was reduced after anti-CD47 antibody treatment 
in vivo, as suggested by co-localization studies (Fig. 2f, Extended Data 
Fig. 5a) and electron microscopy (Fig. 2g, Extended Data Fig. 5b). 
Ultimately, these animals accumulated less apoptotic debris and devel- 
oped lesions with smaller necrotic cores (Fig. 2h, Extended Data Fig. 5c). 
From a mechanistic perspective, anti-CD47 antibody therapy 
was associated with a marked suppression of intraplaque SHP1 
phosphorylation, confirming interruption of the anti-phagocytic 
signalling axis downstream of SIRPa, the cognate anti-phagocytic 
receptor of CD47 (Fig. 2i, Extended Data Fig. 5d). These findings 
indicate that targeting CD47 can reduce atherosclerosis, and appears to 
do so by specifically reactivating efferocytosis within the lesion, without 
altering programmed cell death itself. 

In addition to regulating phagocytosis, CD47 is also known to serve 
as a receptor for the vasoactive and nitric-oxide-regulating cytokine 
thrombospondin-1 (TSP1)”°. However, mice treated with anti-CD47 
antibodies and IgG control had similar systemic blood pressures 
(Extended Data Fig. 6a) and rates of pulmonary nitric oxide elaboration 
(Griess reaction, Extended Data Fig. 6b), suggesting that anti-CD47 
antibodies did not alter endothelial function in vivo. Further, anti-CD47 
antibodies did not influence TSP1 signalling in vitro, having no 
modifying effect on TSP1-dependent MAPK signalling, eNOS 
phosphorylation, SMC proliferation or phagocytosis rates (Extended 
Data Fig. 6c-f). These data suggest that the anti-CD47 antibodies used 
in this study mediated anti-atherosclerotic effects independently of 
TSP1 signalling. Although full-dose anti-CD47 antibody therapy was 
again found to promote splenic erythrophagocytosis and compensatory 
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significantly increases the basal expression of CD47 in vascular SMCs, 
and blunts the decrease expected to occur during apoptosis. e, f, Flow 
cytometry (e) and fluorescent microscopy (f) confirm that TNF-a 
increases the cell-surface expression of CD47 on vascular cells at baseline 
and during programmed cell death. ICC, immunocytochemistry. 

g, In vitro efferocytosis assays indicate that TNF-a treatment renders 
vascular SMCs resistant to programmed cell clearance under a variety of 
pro-atherosclerotic conditions. Comparisons made by two-tailed t-tests. 
*** D < 0.001, **P< 0.01. Error bars represent s.e.m. 


reticulocytosis (CD47 is a critical marker of self that is downregulated 
on ageing red blood cells)*!°, this toxicity was self-limited and anaemia 
was not observed with chronic administration. Otherwise, this pro- 
efferocytic agent appeared to be well-tolerated, having no discernible 
effect on circulating leukocytes, lipid levels, or other metabolic 
parameters relevant to vascular disease (Extended Data Fig. 6g—u and 
Extended Data Table 1a). 

To investigate why CD47 is upregulated in atherosclerosis, we used 
a bioinformatic approach in which we tested for genetic co-expression 
across panels of mouse and human vascular tissue. This yielded a 
list of genes that were significantly co-expressed with CD47 in vivo 
(Extended Data Fig. 7a). Pathway analyses of these genome-wide data 
sets identified ‘inflammation mediated by chemokine and cytokine 
signaling pathway’ as the top pathway associated with CD47 in vascular 
tissue (Extended Data Fig. 7b and Extended Data Table 1b, c), and 
specifically implicated TNF-« as the factor most likely to function as its 
upstream regulator (Fig. 3a). Subsequent correlation studies identified 
a strong positive association between CD47 and both the canonical 
TNF receptor, TNFR1 (Fig. 3b), as well as TNF-a itself (Fig. 3c), in 
human atherosclerotic vessel specimens. Similarly, CD47 expression 
levels were correlated with TNF-a levels in tissue from atherosclerotic 
mice and confirmatory human data sets (Extended Data Fig. 7c, d). 
Together, these informatics and co-expression studies implicate 
TNF-a—a proinflammatory cytokine known to be upregulated in 
atherosclerosis—in CD47-dependent vascular disease”””. 

To investigate the causality of these associations, we next tested 
whether CD47 is directly downstream of TNF-a. We found that 
treatment of vascular SMCs with recombinant TNF-a led to a 
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Figure 4 | TNF-a promotes CD47 expression via NF-«.B1 and is a 
translational cardiovascular target. a, b, Co-expression analyses confirm 
that NF-«B1 is significantly correlated with CD47 expression in both 
human coronary (a) and carotid (b) atherosclerotic plaque. Pearson 
correlation coefficients were determined assuming a Gaussian distribution 
and P values were determined using a two-tailed test. c, Dual luciferase 
reporter assays reveal that CD47 promoter activity is stimulated in cells 
treated with TNF-a (top), but that this effect is significantly enhanced 

in cells co-transfected with an NF-«B1 expression vector. d, Chromatin 
immunoprecipitation studies confirm significant enrichment of NF-KB1 
protein on the CD47 promoter in TNF-a-treated human coronary artery 
SMCs. e, In vitro efferocytosis assays reveal that anti-CD47 antibody 


consistent upregulation of cellular CD47 expression, whereas no effect 
was observed with a variety of other common pro-atherosclerotic or 
proinflammatory insults (Extended Data Fig. 7e-h). Further, TNF-a 
blunted the progressive decrease in CD47 expression normally 
expected to occur during programmed cell death (Fig. 3d-f, Extended 
Data Fig. 7i-l). As a result of their higher levels of anti-phagocytic 
molecules, TNF-c«-treated cells were less likely to be phagocytosed by 
macrophages, particularly when concomitantly exposed to oxidized 
low-density lipoproteins (oxLDL) and pro-apoptotic stimuli (Fig. 3g, 
Extended Data Fig. 7m). Because impaired efferocytosis is known 
to incite proinflammatory cytokine elaboration”, it is possible that a 
positive feedback loop underlies the co-localization of TNF-a, CD47, 
and uncleared pathological cells and apoptotic bodies within the 
atherosclerotic plaque”. 

Analysis of the CD47 promoter in vascular SMCs revealed a region 
of open chromatin predicted to contain binding sites for several of 
the NF-«B transcription factors known to be downstream of TNFR1 
(Extended Data Fig. 8a and Extended Data Table 1d). Among these, 
the classical proinflammatory factor NF-«B1 (p50) was found to be 
positively correlated with CD47 expression in both human coronary 
and carotid plaques (Fig. 4a, b, Extended Data Fig. 8b). Luciferase 
reporter assays performed with a vector containing the CD47 promoter 
revealed that TNF-« treatment stimulated basal CD47 expression, and 
that the effect was specifically enhanced when NF-KB 1was simultane- 
ously overexpressed in these cells (Fig. 4c, Extended Data Fig. 8c-e). 
Chromatin immunoprecipitation assays confirmed that NF-«B1 binds 
to the CD47 promoter in vitro, and that this occupancy is increased sev- 
eral fold upon treatment with TNF-a (Fig. 4d, Extended Data Fig. 8f). 

From a translational perspective, we found that anti-CD47 antibodies 
were able to stimulate efferocytosis in TNF-a-treated cells, and that the 


With anti-CD47 therapy 


enhances the clearance of cells exposed to TNF-a, and that its pro- 
efferocytic capacity is enhanced under pro-atherosclerotic conditions. 

f, Pretreatment with the anti- TNF-a monoclonal antibody infliximab 
prevents the upregulation in Cd47 mRNA that normally occurs in SMCs 
exposed to TNF-a. g, Concomitant inhibition of CD47 and TNF-a using 
anti-CD47 antibodies and infliximab, respectively, produces synergistic 
benefit in the clearance of diseased vascular cells, as assessed by ANOVA. 
h, Putative mechanism explaining why efferocytosis is impaired in 
cardiovascular disease, and how inhibition of CD47-SIRPa signalling 
could represent a new therapeutic target. Comparisons made by two-tailed 
t-tests, unless otherwise specified. ***P < 0.001, **P< 0.01, *P<0.05. 
Error bars represent s.e.m. 


effect was most pronounced under dyslipidaemic, pro-atherosclerotic 
conditions (Fig. 4e). A modest incremental benefit was observed when 
anti-CD47 antibody therapy was combined with commercially available 
anti-TNF-« therapies, such as infliximab or etanercept, probably because 
of their inhibitory influence on CD47 expression in mouse and human 
tissue (Fig. 4f, g, Extended Data Fig. 9). These data are particularly pro- 
vocative given the observation that patients prescribed TNF-«-inhibiting 
antibodies for inflammatory disorders such as lupus and rheumatoid 
arthritis appear to be protected from myocardial infarction?’”’. 

The finding that CD47 expression is pathologically upregulated 
in both cancer and cardiovascular disease suggests a commonality 
between these two conditions. In leukaemogenesis, cancer stem cells 
out-compete normal haematopoietic stem cells, while countering 
signalling associated with programmed cell removal; viable myelo- 
dysplastic syndrome haematopoietic oligolineage progenitors express 
the phagocytic signal calreticulin, but not CD47, whereas acute myeloid 
leukaemia derived from myelodysplastic syndromes are positive for 
CD47 expression. Similar cellular processes in the vasculature may 
explain the recent observations that de-differentiated SMCs undergo 
clonal expansion within the atherosclerotic plaque”>”®. Furthermore, 
the top cardiovascular locus identified by genome-wide association 
studies surprisingly resides near an important tumour suppressor 
locus’, which in turn regulates SMC efferocytosis'”. Future studies will 
need to examine whether expansion of CD47" SMC clones contributes 
to atherosclerosis, and if their clearance can be accomplished without 
the induction of anaemia (for example, with a dose-escalation approach 
that appears to be safe in non-human primates”* and is currently being 
pursued in first-in-human clinical trials?’). 

Together, these data provide insights into why programmed cell 
removal is impaired in the atherosclerotic plaque, and how this may 
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promote lesion expansion. Our findings bolster the ‘inflammatory 
hypothesis’ of atherosclerosis*”, and specifically link cytokine sig- 
nalling with anti-phagocytic signalling in vascular disease. Given the 
experimental success of pro-efferocytic therapies in the oncology field 
using antibodies that block the CD47 signal’, it is possible that these 
findings will provide a novel nonsurgical treatment of cardiovascular 
disease (mechanism shown in Fig. 4h). 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Data reporting. The experiments were not randomized. The investigators were not 
blinded to allocation during experiments, but were blinded during data analysis 
and interpretation. No statistical methods were used to predetermine sample size. 
Human cardiovascular tissue. Carotid endarterectomy samples. In this study, 
a total of 182 human carotid endarterectomy samples and nonatherosclerotic 
control arteries were used. These include heterogeneous atherosclerotic plaque 
samples obtained from patients undergoing surgery for symptomatic (stroke or 
transient ischaemic attack) or asymptomatic (no history of cerebrovascular event) 
high-grade carotid stenosis (>50% NASCET criteria) as part of the Biobank of 
Karolinska Endarterectomies (BiKE). 15 nonatherosclerotic control arteries (iliac 
artery and aorta) were obtained from organ donors without any history of cardi- 
ovascular disease. Patients were consecutively enrolled in the study, with the first 
127 constituting the discovery cohort (40 asymptomatic, 87 symptomatic) and the 
next 50 constituting the validation cohort (10 asymptomatic, 40 symptomatic). All 
samples were collected with informed consent from patients, organ donors or their 
guardians. The BiKE study was approved by the Ethical Committee of the Northern 
Stockholm. DNA and RNA was extracted from these specimens and analysed 
by Ilumina 610w -QuadBead SNP-chips and the Affymetrix HG-U133 plus 2.0 
microarrays (discovery cohort) or the Affymetrix HG-U133a Genechip arrays 
(validation cohort), as previously described!*"” and deposited in Gene Expression 
Omnibus (accession number GSE21545). Robust multi-array average (RMA) nor- 
malization was performed and processed gene expression data was returned in 
log»-scale. The relative expression of each gene was determined by comparing the 
pixel intensity of the 11 probe pairs which correspond to each transcript to the 
‘normalization control set’ specific to each array (see http://www.affymetrix.com/ 
support/technical/technotes/expression_comparison_technote.pdf and http:// 
www.affymetrix.com/support/technical/datasheets/hgu133arrays_datasheet.pdf). 
Student’s t-test with correction for multiple comparisons according to the Sidak- 
Bonferroni method was used for statistical analyses of microarray data. Pearson's 
correlations were calculated to determine associations between expression of the 
gene of interest and other genes from microarrays. Publically available data from 
the Helsinki Carotid Endarterectomy Study (HeCES) were analysed as a second 
human validation cohort*!. Additional publically available vascular and nonvas- 
cular microarray data deposited in GEO were also analysed (as indicated by GSE 
number in the corresponding figure legend), including studies of microdissected 
atherosclerotic plaques and samples taken from individuals treated with com- 
mercially available TNF-a inhibitors. A P value <0.05 was considered to indicate 
significance. 

Coronary artery samples. In this study, a total of 114 human coronary artery samples 
were used. 51 atherosclerotic epicardial coronary artery segments were harvested 
from 22 orthotopic heart transplant donors, as previously described”. Additionally, 
56 coronary artery segments were extracted from atherosclerotic intracoronary 
plaques in patients undergoing coronary atherectomy, as previously described*’. 
Briefly, longitudinal atherectomy was performed with the Silverhawk atherectomy 
catheter in arteries with flow-limiting stenosis after diagnostic angiography. RNA 
was isolated from both sets of specimens and hybridized to custom dual-dye gene 
expression microarrays representing approximately 22,000 features**°°. Briefly, the 
custom probe set was identified from data mining and curation of atherosclerosis 
and vascular cell-culture-based expression analyses and combined with the Agilent 
Human 1A and 1B arrays. In each microarray experiment, the expression levels 
of CD47 were determined by array-specific hybridization probe sets (~10 per 
transcript). Automated feature extraction software was used to filter out back- 
ground or saturated spot intensities to eliminate inherent probe and spatial biases 
during the fluorescent detection. This software computes the log) ratios and filters 
features by P value. These values were normalized according to protocol for each 
array, using a locally weighted linear regression curve fit (LOWESS) to correct for 
dye biases. This normalization procedure results in array specific log, ratios, and 
provides accurate measures of relative gene expression differences (http://www. 
affymetrix.com/support/technical/datasheets/human_datasheet.pdf). Normalized 
data from candidate transcripts were then used to calculate a Pearson correla- 
tion coefficient r assuming a Gaussian distribution, and two-tailed P values were 
calculated for each correlation coefficient. 

In addition to samples obtained for RNA analysis, an additional seven right 
coronary arteries were obtained from rapid autopsies from adult patients with a 
spectrum of coronary artery disease for histological analysis. The arteries were 
fixed with 4% paraformaldehyde (PFA) for several hours, and immersed in 30% 
sucrose at 4 °C overnight. The coronaries were then serially sectioned and segments 
of atheroma and relatively normal coronary artery were embedded in paraffin or 
OCT, and sectioned at 7-\.M thickness. 

For immunohistochemical staining, paraffin slides were deparaffinized and 
rehydrated with xylene and alcohol gradients, and antigen retrieval was performed 
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in sodium citrate buffer (10 mM sodium citrate, 0.05% Tween 20, pH 6.0) using a 
pressure cooker. The sections were blocked with 5% goat serum, and then stained 
with primary antibodies (anti-CD47, Abcam B6H12.2, 21g ml~') or mouse IgG1 
kappa (eBioscience, 2 gm) at 4°C overnight. The sections were washed in TBS, 
incubated with MACH4 kit (Biocare Medical) according to the manufacturer's 
instructions, and then detected using the Vulcan Fast Red Chromogen kit2 
(Biocare Medical). Staining for SMC content was accomplished by washing sections 
in water and PBS, followed by incubation with anti-SM22 alpha antibodies (Abcam, 
ab14106, 1:300). The sections were washed with PBS, incubated with Alexa Fluor 
488 goat anti-Rabbit (Life technologies, 1:250), washed, and mounted with 
Vectashield Mounting medium with DAPI (Vector Laboratories). Pictures were 
taken by Nikon digital camera mounted on an inverted fluorescence microscope. 
Serial sections were prepared as described above, and also stained with Masson's 
trichrome (Richard-Allan), or haematoxylin and eosin (Richard-Allan). For the 
staining of frozen sections, OCT was removed in water, and the sections were 
stained with Oil Red O (ORO, Sigma-Aldrich, 00625, 0.5%), haematoxylin 
and eosin (H&E, Richard-Allan), smooth muscle alpha-actin (Abcam, ab5694, 
1:300), HMGB1 (Abcam, ab18256, 1:100), CD206 (Abcam, ab64693, 1:50), CD47 
(Abcam B6H12.2 or Novus Biologicals, #NBP2-31106, 1:50) or mouse IgG1 kappa 
(eBioscience). Secondary antibodies included Alexa Fluor 594 goat anti-mouse 
(Life technologies, A11005, 1:300) and Alexa Fluor 488 goat anti-rabbit (Life 
technologies, A11034, 1:300). Antibody specificity was confirmed using isotype 
control and by preincubating the anti-CD47 antibodies with recombinant CD47 
antigen (R&D Systems) in a 1:5 ratio for 16h at 4°C before application. High 
resolution imaging of the carotid sections was performed as previously described**. 
Murine cardiovascular tissue. Atherosclerosis models. In the atherosclerosis studies 
described below, a total of 179 male apoE-deficient mice on the C57BL/6 back- 
ground (apoE ~~ Jackson Laboratory, catalogue #002052) were used. 

In the main atherosclerosis intervention studies, 8-week-old mice were 
implanted with subcutaneous Alzet minipumps (model 2004, Alzet Osmotic 
Pumps) containing Angiotensin II (AngII, Sigma-Aldrich, 1000ngkg~' min ~') 
and initiated on a high fat Western diet (21% anhydrous milk fat, 19% casein 
and 0.15% cholesterol, Dyets no. 101511) for the ensuing 4 weeks, as previously 
described’”. To determine the effect of CD47 signalling on vascular disease, mice 
were injected with either 200 1g of the inhibitory anti-CD47 antibodies (MIAP410, 
BioXcell, n= 18) or IgG1 control (MOPC-21, BioXcell, n= 20) IP QOD, at the 
dose previously studied!°, The antibody therapy was started one day before the 
pump implantation. Animals were observed daily, and in the case of premature 
sudden death, necropsy was performed to determine the cause of mortality. Blood 
pressure in conscious mice was measured at baseline (after standard acclimati- 
zation with a Visitech Systems Inc. machine), and weekly for the duration of the 
study. At 12 weeks of age, the mice were killed after an overnight fast, with serum 
and visceral organs (including the aortae) isolated and processed for analysis. 

In the chronic atherosclerosis studies, male apoE-deficient mice were weaned 
and initiated on a high fat diet at 4 weeks of age and maintained on this for the 
subsequent 12 weeks (without any angiotensin infusion). Antibody injection 
was performed as described above while the high fat diet was given (n= 9 per 
condition), and the animals were killed at the age of 16 weeks. 

In the established disease model, apoE-deficient mice were weaned onto a high 
fat diet at 4 weeks of age and continued on this for the ensuing 8 weeks (without 
any angiotensin infusion). At 12 weeks of age (after lesions had developed), mice 
were initiated on 200 1g of anti-CD47 antibodies (n= 14) or IgG1 (n= 13) IP QOD 
for the ensuing 6 weeks, and killed at 18 weeks of age. 

In the TNF-a inhibitor synergy studies 8-week-old apoE-deficient mice were 
implanted with AngII pumps and maintained on Western high-fat diet, and then 
randomized to one of four groups, including the: (1) IgG group (mouse IgG and 
human Fc, n = 10); (2) etanercept group (etanercept 0.2 mg per kg SQ weekly 
(Amgen) and mouse IgG IP QOD, n= 8); (3) CD47 antibody group (MIAP410 50p1g 
IP QOD and human Fc SQ weekly, n= 16); or (4) combination group (MIAP410 
and etanercept, n = 19)). In this model, the antibody treatment was started the 
day before AngII pump implantation and delivered for 4 weeks, and the mice were 
euthanized at 12 weeks of age. Note that in this model, the anti-CD47 antibody 
dose was reduced by 75% to determine whether a lower dose of therapy could also 
affect atherogenesis. 

In the short-term intervention model, 8-week-old apoE-deficient mice were 
implanted with AngII pumps and maintained on Western high-fat diet without 
antibody therapy for the ensuing 23 days. Beginning at day 23, the mice received 
SQ injections of either: (1) IgG daily, n = 10; (2) etanercept (0.8 mg per kg at day 23, 
n=6); (3) anti-CD47 antibody (200 1g of MIAP410 daily, n= 11); or (4) combina- 
tion therapy, n = 11 between days 23 and 27. This cohort of mice was killed after 
only 5 days of antibody treatment (at day 28) and was used to evaluate the effect of 
antibody therapy in established atherosclerotic plaques of identical size. 
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To evaluate the effect of antibody therapy on atherosclerotic plaque vulnerability, 
we used the recently described ‘tandem stenosis’ model*°. At 6 weeks of age, 
apoE-deficient mice were initiated on high-fat diet and maintained on this for 
ensuing 6 weeks. At 12 weeks of age, tandem stenoses were introduced in a manner 
shown to reproducibly alter shear stress and induce plaque rupture, as previously 
described*>. Briefly, the mice were anaesthetized by isoflurane inhalation and an 
incision was made to allow dissection of the right common carotid artery from the 
circumferential connective tissues. Serial stenosis with a 150|1m outer diameter 
were then introduced 1mm and 4mm from the carotid bifurcation. The stenosis 
diameter was obtained by placing a 6-0 suture around the carotid artery together 
with a 150jm needle that was tied to it and later removed. Antibody therapy 
was started the day before the surgery and continued thereafter. Mice were killed 
7 weeks after the surgery and intraplaque haemorrhage was quantified within the 
processed tissue sections. 

In addition to the mice treated with anti-CD47 antibody or control antibody, 
a separate cohort of 12 male apoE~/~ mice were fed a high-fat diet for either 8, 
12 or 20 weeks, but were not treated with antibody nor implanted with osmotic 
minipumps. These mice were used to determine aortic gene expression changes 
during atherogenesis, using the RNA analysis methods described below. In these 
experiments, comparison was made to control C57BL/6 mice fed standard chow 
diet for 24 weeks. 

Finally, a cohort of 3 apoE~/~ mice were implanted AngII osmotic pump and fed 
a high-fat diet for 4 weeks (but not treated with antibody) and then injected with 
biotin-labelled anti-CD47 antibodies 24 and 6h before being killed, to determine 
where the therapeutic antibody accumulates in vivo. Nonatherosclerotic C57BL/6 
and CD47-deficient mice (Cd47~‘-, Jackson Laboratory, catalogue #003173) were 
also injected with the biotin-labelled antibodies, and served as controls. All animals 
were analysed as described below. 

Vascular tissue preparation, immunohistochemistry and atherosclerotic lesion 
quantification. Aortic atherosclerosis lesion area was determined as described 
previously!”. Briefly, the arterial tree was perfused with PBS and then fixed 
with 4% PFA. The heart and the full-length of the aorta-to-iliac bifurcation was 
exposed and dissected carefully from any surrounding tissues. Thoracic aortas 
were then opened along the ventral midline and dissected free of the animal and 
pinned out flat, intimal side up, onto black wax. Aortic images were captured 
with a digital camera mounted on a Nikon stereomicroscope and analysed using 
Adobe Photoshop CS5 software. The percentage of lesion area was calculated as 
total lesion area divided by total surface area. The atherosclerotic lesions within 
the aortic valve area (aortic sinus) were analysed as described previously'”. The 
samples were perfused with PBS, fixed with 4% PFA, embedded in OCT, and 
sectioned at 7-1.M thickness. Four sections at 100-1M intervals were collected 
from each mouse and stained with ORO, Masson's trichrome, haematoxylin and 
eosin, smooth muscle «-actin (SMA, Abcam, ab5694, 1:300), Mac-3 (BD Sciences, 
BD 550292, 1:100), CD-3 (Abcam, ab5690, 1:150), and Ly-6G (BD Sciences, BD 
551459, 1:300). Atherosclerosis burden was quantified from the luminal aspect of 
the blood vessel through the plaque to the internal elastic lamina (that is, lipid in 
the neointima was quantified). Necrotic core size was quantified by calculating the 
area of the lesion which was acellular on Masson's trichrome staining, as previously 
described!”°°, Plaque haemorrhage was quantified by determining the presence 
or absence of red blood cells (TER-119, Santa Cruz Biotechnology) within the 
plaque, as previously described*”. Subsequent immunohistochemical studies were 
quantified from the luminal aspect of the blood vessel through the plaque to the 
external elastic lamina (to assess changes which also involved the tunica media). 
To detect the localization of injected biotin-labelled anti-CD47 antibodies, the 
avidin-biotin complex technique was used. Frozen sections of aortic sinus were 
prepared from the mice injected with biotin-labelled anti-CD47 antibodies, as 
described above. Endogenous peroxide activity was blocked by incubation with 
0.3% hydrogen peroxide for 30 min, and the sections were washed with water and 
PBS, followed by blocking with 5% goat serum for 30 min. Biotin was detected 
using Vecstatin ABC kit and DAB substrate kit per protocol (Vector laboratories). 
Corresponding aortic sinus sections from the mouse without biotin antibody injec- 
tion were used as negative controls, as were Cd47-/~ mice which had been injected 
as above. For immunoflourescent staining of these samples, sections were blocked 
with 5% goat serum for 30 min, then incubated with Streptavidin- Alexa Fluor 
546 conjugate (Life Technologies, 1:300) and Mac-3 (BD, 1:100) for 1h, followed 
by Alexa Fluor 488 donkey anti-rat (Life Technologies, 1:300). In vivo apoptosis 
was assessed by staining for TUNEL positivity with the Cell Death Detection Kit 
(Roche), per protocol, and confirmed with cleaved caspase 3 (Cell Signaling #9661, 
1:200) staining followed by Alexa Fluor 488 goat anti-rabbit (Life technologies, 
1:250). The cleaved-caspase-3-positive area was measured and quantified using 
Adobe Photoshop, and the percentage of positive area was calculated as total 
caspase-3-positive area divided by total atherosclerotic plaque area measured by 


ORO staining in the serial sections. To calculate the in vivo phagocytic index, we 
performed double staining of cleaved caspase 3 (detected with Alexa Fluor 488 
goat anti-rabbit antibody) and Mac-3 (detected with Alexa Fluor 594 goat anti- 
rat antibody (Life technologies, 1:250)). The number of free apoptotic cells not 
associated with a macrophage (indicated by a star (Fig. 2f)) was manually assessed 
in a blinded fashion, and compared to apoptotic cells associated with a macrophage 
(indicated by an arrow), as previously described”. For phospho-SHP1 staining, the 
sections were stained with phospho-SHP1 antibodies (Abcam, ab131500, 1:50) and 
Mac-3 followed by Alexa Fluor. The phospho-SHP1-positive area was normalized 
to Mac-3-positive area. All lesion areas and indices were measured and quantified 
using Adobe Photoshop by a blinded observer. Samples collected from several 
tissue beds were also snap-frozen in liquid nitrogen for subsequent mRNA and 
protein expression analysis, as described below. 

Vascular tissue electron microscopy. Electron microscopy was performed in 
the Stanford University Cell Sciences Imaging Facility, as previously described”. 
Briefly, samples were fixed and processed using standard histologic techniques 
then imaged using a JEOL JEM-1400 transmission electron microscope. Ingested 
apoptotic bodies (indicated by white arrow (Fig. 2g)), free apoptotic bodies 
(indicated by yellow arrow), and apoptotic bodies undergoing secondary necrosis 
(indicated by red arrow) were qualitatively assessed in a blinded manner, as 
previously described”. 

Serum and plasma analysis. Serum chemistry, lipid, complete blood count, 
and differential analyses were performed by the Stanford Animal Diagnostic 
Laboratory, as previously described”. In brief, blood samples were collected by 
cardiac puncture after an overnight fast. Automated haematology was performed 
on the Sysmex XT-2000iV analyser system. Blood smears were prepared for 
all full complete blood count samples and reviewed by a medical technologist. 
Chemistry analysis was performed on the Siemens Dimension Xpand analyser, 
and included analyses of renal function, electrolyte levels, liver function tests, 
fasting glucose levels and fasting lipid panels. A medical technologist performed 
all testing, including dilutions and repeat tests as indicated, and reviewed all data. 
Serum insulin levels were measured by ELISA kit according to the manufacturer's 
instruction (EMD Millipore). 

Griess reaction. The activity of nitric oxide synthase was evaluated using a 
modified Griess assay. Lung samples were collected from mice and snap-frozen 
by liquid nitrogen before homogenization in PBS. The nitrate and nitrite levels were 
measured by Ultrasensitive Colorimetric Assay for nitric oxide synthase (Oxford 
Biomedical Research) according to the manufacture’s instructions, and standard- 
ized by the protein amount. 

Hybrid Mouse Diversity Panel. The Hybrid Mouse Diversity Panel (HMDP), 
which includes a quantitative analysis of 109 classical and recombinant inbred 
mouse strains**, was used to identify factors associated with vascular CD47 
expression, in vivo. Briefly, whole aorta from the arch to the mid-abdomen was 
snap-frozen at the time of death and total RNA was isolated using the RNeasy 
kit (Qiagen), as described*’. Genome-wide expression profiles were determined 
by hybridization to Affymetrix HT-MG_430 p.m. microarrays on a subset of 
female mice from 104 strains (n = 2 aorta per strain). Quantification of plasma 
cytokines was carried out in a multiplexed immune-capture microbead system 
(Milliplex Mouse Cytokine/Chemokine Magnetic Bead Panel MCYTOMAG-70K, 
EMD Millipore) as per the manufacturer’s instructions. Cytokines profiled were: 
G-CSF, GM-CSF, IFNr, IL-1a, IL-1, IL-2, IL-4, IL-6, IL-7, IL-10, IL-12 (p40), 
IL-12 (p70), IL-13, IL-15, IP-10, KC, MCP-1, MIP-1a, MIP-18, M-CSF, MIP-2, 
MIG, RANTES, TNF-a. Plasma insulin was measured using the mouse insulin 
ELISA kit (80-INSMS-E01, Alpco) as per the manufacturer’s instructions. Pearson's 
correlations were generated to calculate transcript-transcript and transcript-trait 
correlations. Using these methods, the genes and plasma cytokines that were 
significantly associated with aortic CD47 expression levels were identified. 

In silico bioinformatics methods. Pathway analysis. Genome-wide correlation 
analyses were performed to identify genes that are significantly correlated with 
CD47 expression in the human and murine vascular tissue collections described 
above. These lists were intersected to identify genes that are commonly co- 
expressed across multiple data sets in both species. The resulting list of genes 
(Extended Data Fig. 7a) was subjected to a series of bioinformatics analyses 
including the Database for Annotation, Visualization and Integrated Discovery 
(DAVID), Kyoto Encyclopedia of Genomes and Genes (KEGG), Gene Ontology 
(GO), and Protein Analysis Through Evolutionary Relationships (PANTHER) 
classification. Additionally, genes were mapped to open chromatin regulatory 
intervals in primary human vascular cells and analysed using the Genomic 
Regions Enrichment of Annotations Tool (GREAT). Pathways found to be over- 
expressed were ranked by P value. Statistical overrepresentation and enrichment 
analyses were performed using Bonferroni correction for multiple testing 
(P < 0.05 cutoff). 
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Ingenuity Pathway Analysis (IPA) was then performed on the resulting 
intersected CD47 co-expression gene list from human carotid atherosclerosis 
(BiKE study) and murine aortic atherosclerosis (HMDP study). Briefly, the 63 gene 
identifiers and expression values were analysed using the Core pathway analysis 
after removing any duplicates and unmapped identifiers. The resulting networks 
were then subjected to an Upstream Regulators Analysis using the Ingenuity 
Knowledge Base after applying a filter to include all genes, RNAs and proteins, 
while excluding chemicals or drugs. Overlap P values were calculated for each 
upstream regulator using a Fisher's exact test and activation z-scores were calcu- 
lated by comparing observed direction of target genes with inferred literature- 
derived regulatory direction to identify the most significant upstream regulators 
for CD47. 

Promoter analysis. Using the UCSC browser, the genetic sequence 1 Kb upstream 
of the CD47 promoter was identified and analysed, as previously described!’. 
In addition to evaluating for open chromatin status and DNase hypersensitiv- 
ity sites, potential transcription factor binding sites (TFBS) were predicted using 
the following online bioinformatics tools: TRANSFAC (BIOBASE), P-Match, 
TFSearch, Alibaba, PROMO, and MatInspector. High confidence binding sites 
(85% likelihood cutoff) were accepted for additional analysis. Additionally, the 
predicted binding sites in the CD47 promoter region were intersected with open 
chromatin peaks identified from the assay for transpose accessible chromatin 
followed by sequencing (ATAC-seq) in primary human coronary artery SMCs 
(HCASMC). 

Cell culture. Primary vascular SMCs were collected from the aortas of C57BL/6 
mice and propagated in DMEM supplemented with 10% FBS, as previously 
described!7°, Human coronary artery SMCs (HCASMC, Lonza Catalog CC-2583, 
passage 3-6) were propagated in SmGM-2 growth media (Lonza) containing 5% 
FBS. To obtain human macrophages, leukocyte reduction system (LRS) cham- 
bers were obtained from the Stanford Blood Center from anonymous donors. 
Monocytes were purified on an autoMACS Pro Separator (Miltenyi) using whole- 
blood anti-CD14 microbeads (Miltenyi) and differentiated to macrophages by 
culture for 7-10 days in IMDM-+GlutaMax (Invitrogen) supplemented with 10% 
AB-human serum (Gemini Bio-Products 100-512) and 100 U mI"! penicillin 
and 100,.g ml! streptomycin (Invitrogen). RFP* mouse macrophages were 
generated and evaluated as previously described"). Briefly, bone-marrow cells 
were isolated from C57BL/Ka Rosa26 mRFP1 transgenic mice and differentiated 
in IMDM + GlutaMax supplemented with 10% fetal bovine serum, 100U ml"! 
penicillin and 100j1g ml“! streptomycin, and 10 ng ml~! murine M-CSF 
(Peprotech). Mouse yolk sac endothelial cell line (C166, ATCC, CRL-2581) and 
mouse macrophage cell line (RAW 264.7, ATCC, TIB71) were grown in DMEM- 
growth media containing 10% FBS, while mouse T-lymphocyte cell line (EL4, 
ATCC, TIB-39) were grown in DMEM containing 10% horse serum. Human 
macrophage cell line (THP1, ATCC, TIB-202) were grown in RPMI-1640 medium 
containing 10% FBS and 0.05 mM 2-mercaptoethanol. Human embryonic kidney 
cells (HEK-293, ATCC, CRL-1573) used for luciferase reporter assays were grown 
in DMEM-growth media containing 10% FBS. No additional cell authentication 
or mycoplasma contamination testing was performed. 

A variety of atherosclerosis-related or pro-apoptotic stimuli were applied to 
the cells in the experiments described below including: oxidized LDL (oxLDL, 
50 jug ml, Alfa Aesar), Ang II (100nM, Sigma-Aldrich), fibroblast growth factor 
(FGE, 100ng ml, R&D), platelet-derived growth factor (PDGF, 100 ng ml}, 
R&D), and lipopolysaccharide (LPS, 1 1g ml"), Sigma-Aldrich). Additionally, 
a number of cytokines associated with CD47 expression through the HMDP 
Luminex array were also tested, including tumour necrosis factor « (TNF-a, 
50ngml~!, R&D), interleukin 2 (IL-2, 100ng ml“, Biolegend), chemokine 
receptor ligand 1 (CXCL1, 100ng ml}, Biolegend), interleukin 4 (IL-4, 50 ngml!, 
Biolegend), and transforming growth factor 3 (TGF-f, 50ng ml ~!, R&D). Before 
experimentation, SMCs were serum-starved for 24h in DMEM and then stimulated 
with the stimuli listed above for 24h before analysis. In some experiments, the cells 
were stimulated with staurosporine (STS, 11M, Sigma-Aldrich) for 1 or 4h to 
induce apoptosis, after 24h of TNF-a treatment. 

To inhibit TNF-a signalling, a chemical inhibitor (SPD 304, Sigma Aldrich) 
or a monoclonal antibodies (Infliximab, Janssen) were used. Briefly, TNF-c was 
pre-incubated with 101M of SPD304 or 100\.g ml of infliximab in serum free 
DMEM for 20 min before cell stimulation. To inhibit NF-KB signalling, the cells 
were pre-treated with 10}1M BAY 11-7085 (Santa Cruz Biotechnologies) or DMSO, 
then stimulated with TNF-a. 

For MAPK western blotting experiments, mouse aortic SMCs were serum- 
starved for 48 h, pre-treated with 10,.g ml! of CD47 antibodies or IgG for 20 min, 
then stimulated with thrombospondin-1 (TSP1, 10j1g ml!, R&D) for 10 or 30min. 
For eNOS western blotting experiments, C166 cells were serum-starved for 8h, 
pre-treated with 2).M TSP1 with or without 101g ml“! of CD47 antibody or IgG 
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for 20 min, then stimulated with acetylcholine (Ach, 101M, Sigma-Aldrich) for 
15 min. 

mRNA isolation and quantitative reverse-transcription PCR. RNA was isolated 
from cell lysates using the miRNeasy Mini Kit (Qiagen) according to the manu- 
facturer’s protocol. RNA was isolated from murine organ samples using the Trizol 
method (Invitrogen). RNA was quantified with the Nanodrop machine (Agilent 
Technologies). For quantitation of gene transcription, cDNA was generated with 
MultiScribe reverse transcriptase (Applied Biosystems), and then amplified on 
the ABI PRISM 7900HT with commercially available TaqMan primers (Applied 
Biosystems) and normalized to 18S internal controls, as previously described". 
A list of the primers and probes used in these studies is provided in Extended 
Data Table le. 

Protein extraction and western blotting. Total protein was isolated from cultured 
cell lines and tissue homogenates using 1 cell lysis buffer (Cell Signaling) sup- 
plemented with 1x Halt Protease & Phosphatase Single-Use Inhibitor Cocktail 
(Thermo Scientific), as previously described!’. The protein concentration in each 
sample was measured using Pierce BCA Protein Assay Kit (Thermo Scientific). 
Equal amounts of protein were loaded and separated on precast gels (Bio-Rad) and 
thereafter transferred onto PVDF membranes (Bio-Rad). Following a 1h incuba- 
tion in 5% bovine albumin serum solution prepared in 1 x TBST, these membranes 
were probed with commercially available antibodies designed to recognize endog- 
enous P38 (Cell Signaling, 1:1000), phospho-P38 (Cell Signaling, 1:1000), ERK1/2 
(Cell Signaling, 1:1000), phospho-ERK1/2 (Cell Signaling, 1:1000), phospho-eNOS 
(Cell Signaling, 1:1000), CD47 (Novus Biologicals, 1:1000) and GAPD (1:1000; Cell 
Signaling Technologies) overnight at 4°C. Membranes were rinsed with TBST and 
incubated with appropriately matched horse-radish peroxidase (HRP)-conjugated 
anti-mouse (1:5000; Life Technologies) or anti-rabbit (1:5000; Life Technologies) 
antibodies for 1h, before protein expression was detected using SuperSignal 
West Pico Chemiluminescent substrate (Thermo Scientific). Membranes were 
then scanned with a Licor Odyssey Fc imager for quantitative analysis. In some 
experiments, membranes loaded with protein were incubated with anti-CD47 anti- 
body that had been preabsorbed with CD47 peptide (R&D systems, 1866-CD, 1:5 
dilution) for 16h, to determine the specificity of the primary antibody. 
Apoptosis assays. To evaluate apoptosis, the luminometric Caspase-Glo 3/7 
Assay (Promega, G8090) was performed on cultured cells, according to the 
manufacturer’s protocol. Briefly, mouse aortic SMCs were seeded in 96-well plates 
at the density of 10,000 cells per well, grown at 37°C for 24h, and then serum- 
starved for 24h. Apoptosis was induced with 1|1M STS treatment for 4h in the 
presence of 10j1g ml! of anti-CD47 antibodies or IgG. Confirmatory assays were 
performed by flow cytometry, where cells where exposed to 24 or 72h of vehicle, 
50ng ml! of TNE-a, or 50 bg, ml™! of oxLDL, then treated with 11M STS for 
4h before being collected in TrypLE. These cells were stained with anti-annexin 
V antibody labelled with fluorescein isothiocyanate (FITC) and propidium iodide 
(eBioscience) and analysed by Scanford cell analyser (Stanford Shared facility, 
Stanford), as previously described”. These FACS data was analysed by FlowJo 
10.115. 

Proliferation assays. A modified MTT (3-[4,5-dimethyl-thiazol-2-yl]-2,5- 
diphenyltetrazolium bromide) assay was performed to analyse SMC proliferation 
and viability. Mouse aortic SMCs were seeded in 96-well plates at the density of 
6,000 cells per well, grown at 37°C overnight, and then serum-starved for 48h. 
The cells were stimulated with 10% serum or 10g ml~! TSP1 with or without 
10g ml of anti-CD47 antibodies or IgG for 24h and then incubated for 4h in 
the presence of 1011 of MTT AB solution (Millipore). The formazan product was 
dissolved by addition of 100 1l acidic isopropanol (0.04 N HCl) and absorbance 
was measured at 570 nm (reference wavelength 630 nm) on an ELISA plate reader 
by SpectraMax 190 Microplate Reader (Molecular Devices). 

Efferocytosis assay. Standard in vitro phagocytosis assays were performed as previ- 
ously described*’. SMCs were labelled with 2.5 1M carboxyfluorescein succinimidyl 
ester (CFSE) according to the manufacturer’s protocol (Invitrogen). 100,000 SMCs 
were plated per well in a 96-well plate (Corning) and pre-incubated with anti- 
body (IgG1 isotype control (MOPC-21) or MIAP410 (anti-CD47)) for 30 min 
at 37°C. An unrelated anti-CD8 antibody was also tested as a negative control. 
After 30 min, 50,000 macrophages were added to each well and co-incubated 
for 2h in serum-free medium, then analysed using an LSRFortessa cell analyser 
with high throughput sampler (BD Biosciences). RFP* mouse macrophages were 
identified by intrinsic fluorescence. Dead cells were excluded from the analysis 
by staining with DAPI (Sigma). Phagocytosis was evaluated as the percentage of 
GFP* macrophages using FlowJo X 10.0.7r2 (Tree Star) and was normalized to the 
maximal response by each independent donor against each cell line. In addition 
to measuring basal efferocytosis rates, experiments were repeated with target cells 
that had been pretreated with a variety of compounds (alone or in combination) 
including STS, oxLDL, TNF-a, TSP1, and infliximab. Confirmatory assays were 
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performed with RAW macrophages where target cells were labelled with 11M of 
CellTracker Deep Red dye (Life technologies) and phagocytes were labelled with 
1.25 1M of CellTracker Orange CMRA (Life technologies) at 37 °C for 30 min. As 
above, cells in these assays were treated with 501g ml! of oxLDL, 50 ng ml! of 
TNF-a, or 100j:g ml of infliximab antibody for 24h before co-culture in serum- 
free medium with 10g ml! of antibody of IgG1 isotype control (MOPC-21) or 
anti-CD47 (MIAP410) for 2h at 37°C. Double-positive cells were quantified using 
the Scanford cell analyser (Stanford Shared facility) and analysed by FlowJo 10.115, 
as previously described”, Statistical significance was determined by one-way or 
two-way ANOVA with Bonferroni’s correction using Prism 5 (Graphpad). 

Flow cytometry. To measure the cell surface expression of CD47, cells were 
exposed to vehicle or 50ng ml! of TNF-a for 24h. In some experiments, cells were 
treated with 1 |.M STS during the last 1h or 4h of the incubation, before analysis. 
The cells were collected in TrypLE (Life Sciences) and stained with anti-CD47 
antibodies (AbD Serotec MCA2514GA, clone1/1A4, 1:400) or IgG (eBioscience), 
followed by Alexa Fluor 488 goat anti-mouse (Life Technologies, 1:400), and 
then FACS-sorted within 1h (BD FACSCaliber, 530 nm fluorescence [FL1] and 
>575 nm [FL3]). Analysis was performed with FloJo 7.6.3. 
Immunocytochemistry. Primary mouse aortic SMCs were seeded at approx- 
imately 60% confluence in glass-bottom culture dishes (MatTek Corporation). 
Following treatment with TNF-a and/or STS, as described above, cells were rinsed 
with PBS and fixed with freshly prepared 4% PFA (Fisher Scientific). Once per- 
meabilized with 0.1% Triton X-100 (Sigma), cells were incubated with blocking 
buffer (3% BSA, Cell Signaling Technology) for 1h and then incubated overnight 
at 4°C with a CD47 antibody (1:80; R&D Systems, Cat# AF1866-SP). After rins- 
ing with PBS, cells were incubated in the dark with a donkey anti-goat Alexa 
Fluor 594 conjugate secondary antibody (1:1000; Life Technologies) for 1h then 
briefly incubated with DAPI. CD47 and DAPI localization was captured at 20 x 
magnification using a Leica DMI3000 B microscope capable of taking fluorescent 
images. Studies using primary human coronary artery SMCs were also performed 
as above, but used 5% goat serum blocking buffer (ThermoFisher Scientific), 
and the following primary and secondary antibodies: CD47 antibody (Novus 
Biologicals, 1:50); HMGB1 (abcam, ab18256 1:100)); Alexa Fluor 594 goat anti- 
mouse (Life Technologies A11005, 1:300); and Alexa Fluor 488 goat anti-rabbit 
(Life Technologies, A11034). In some studies, exogenous CD47 peptide was prein- 
cubated with the cells before the CD47 antibody was applied, as described above. 
Luciferase reporter assay. CD47 LightSwitch Promoter Reporter GoClones 
(RenSP, S710450), empty promoter vectors (S790005) and Cypridina TK Control 
constructs (pTK-Cluc, SN0322S) were obtained from SwitchGear Genomics 
and transfected into HEK cells using lipofectamine 2000 (Invitrogen). For 
overexpression assays, expression plasmids for Nfkb1 (p50), Rela (p65), Nfkb2 
(p52), and c-Rel were obtained from Addgene (#21965, #21966, #23289, #27256, 
respectively). Empty vector pCMV4 was generated by HindIII digestion of 
#21966, and pcDNA3.1 was obtained from Life Technologies. 50 ng of plasmid 
was co-transfected with 45 ng of the RenSP reporter and 5 ng of the pTK-Cluc 
reporter construct. Media was changed to fresh DMEM and 50ngml“! of TNF-a 
was added 2, 8, 24, and 36 h before collection. The cell lysate and supernatant were 
collected 48h after transfection and dual luciferase activity was measured with the 
LightSwitch Dual Assay System using a SpectraMax L luminometer (Molecular 
Devices), according to the manufacturer’s instructions. Relative luciferase 
activity (Renilla/Cypridina luciferase ratio) was quantified as the percentage change 
relative to the basal values obtained from control-transfected cells not exposed to 
TNF-a treatment. 

Chromatin immunoprecipitation. Chromatin immunoprecipitation (ChIP) 
was performed according to the Millipore Magna-ChIP protocol with slight 
modifications. HCASMC were cultured in normal growth media until approx- 
imately 75% confluent. Cells were fixed in 1% formaldehyde for 10 min to 
cross-link chromatin, followed by quenching with glycine for 5 min at room 
temperature. 2 x 10’ cells per condition were collected, and nuclear lysates 
were prepared according to the manufacturer’s protocol. Cross-linked chroma- 
tin nuclear extracts were sheared into approximately 500 bp fragments using a 
Bioruptor (Diagenode) for 3 cycles of 3 min (30s on, 305 off). Sheared chromatin 
was clarified via centrifugation at 4°C for 10min. 1 x 10° nuclei per condition 
was incubated with 21g rabbit IgG or anti- NF-KB p105/p50 antibody (Abcam, 
Ab7971) plus protein A/G magnetic beads overnight at 4°C on a rotating platform 
to capture the protein-DNA complexes. Complexes were washed in low salt, high 
salt, LiCl, and TE buffers and then eluted with a ChIP Elution Buffer containing 
Proteinase K. Free DNA was subsequently purified using spin columns. Total 
enrichment was measured using primers designed based on the sequence of 
the top NfkB binding site within the CD47 promoter: forward (—804 to —781): 


5’-ATAGGGAAGAGCAGAGCGAGTAGA-3’ and reverse (+627 to +609): 
5'-GCGTGGACCAGGACACCTA-3’), or a negative control region using the 
following primers: forward: 5'-CCGGAAGCACTTCTCCTAGA-3’ and reverse: 
5/-AAGAGAGAGCGGAAGTGACG-3’. Quantitative real-time PCR (ViiA 7, Life 
Technologies) was performed using SYBR Green (Applied Biosystems) assays and 
fold enrichment was calculated by measuring the AAC,- AAC, IgG. Melting curve 
analysis was also performed for each ChIP primer. Data are presented as the per- 
centage of input DNA and as fold enrichment of chromatin precipitated with the 
NF-KB antibodies relative to the control IgG. In some experiments, cells were 
treated with TNF-a for 90 min and 24h before isolation of nuclear lysates. 
Statistical analysis. Aside from the microarray data (described above), all 
experimental data are presented as mean + s.e.m. Data were subjected to the 
Kolmogorov-Smirnov test to determine distribution. Groups were compared using 
the Mann-Whitney U test for non-parametric data or the two-tailed Students t-test 
for parametric data. When comparing multiple groups, data were analysed by 
analysis of variance with one way ANOVA followed by Tukey’s or Dunnett’s post- 
hoc test. For multiple testing of parametric data, a value of P < 0.05 was considered 
statistically significant. In vitro experiments were replicated at least in triplicate 
and all analyses were performed in a blinded fashion by two separate investigators, 
unless otherwise specified. In the in vivo intervention studies, comparison was 
made between mice treated with anti-CD47 antibodies and IgG. In the in vivo 
synergy studies, ANOVA was performed as above with multiple comparison and 
linear trend post-testing across all four groups. In the in vitro studies, comparison 
was made between the intervention (anti-CD47 antibodies) and control (IgG) 
arms. ‘Vehicle control’ was only used in experiments where an additional treatment 
(for example, TNE, oxLDL, infliximab) was studied. In the TaqMan-based CD47 
expression experiments, changes were demonstrated as mRNA fold-change com- 
pared to the baseline condition (set as ‘1’). In the microarray-based experiments, 
relative expression differences were displayed across conditions (for example, 
atherosclerosis versus no atherosclerosis). In the in vitro phagocytosis assays, effe- 
rocytosis rates are displayed as percent of maximum, as previously described". 
Statistical analysis was performed with GraphPad Prism 5. Aside from the human 
plaque microarray studies (displayed as Tukey box plots) and the correlation plots 
(displayed as the 95% confidence band of the best fit line), all error bars display 
the standard error of the mean. 

Study approval. All animal studies were approved by the Stanford University 
Administrative Panel on Laboratory Animal Care (protocol 27279) and conform 
to the Guide for the Care and Use of Laboratory Animals published by the US 
National Institutes of Health (NIH Publication No. 85-23, revised 1996). All human 
studies were performed with written informed consent and with the approval of 
the Ethical Committee of Northern Stockholm (BiKE). 
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Extended Data Figure 1 | CD47 expression correlates with risk for 
clinical cardiovascular events and is progressively upregulated in the 
necrotic core of human blood vessels during atherogenesis. a, cDNA 
microarray expression profiling in the BiKE carotid endarterectomy 
biobank reveals that the relative expression of CD47 is increased in 
vascular homogenates taken from subjects with symptomatic disease 
(stroke or transient ischaemic attack, n = 85) compared to those with 
stable, asymptomatic lesions (n= 40). Similar findings were observed 
in the non-overlapping discovery and validation cohorts from BiKE 
(n=55), and a second validation cohort from the Helsinki Carotid 
Endarterectomy Study (HeCES, n = 21). Data presented as Tukey box 
plots. b, Immunohistochemical staining reveals that CD47 co-localizes 
with lipidated plaque within human coronary lesions, as measured by 
Oil-Red-O (ORO) staining. c, Immunofluorescence staining of coronary 
samples confirms that CD47 is upregulated within the necrotic core. 

d, High magnification (40 x) imaging of atherosclerotic coronary plaque 
confirms that CD47 expression is present on the surface of nucleated cells 
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anti-CD47 antibody is confirmed in assays where the signal was quenched 
by preincubating the sections with recombinant CD47 peptide before 
primary antibody exposure. e, Additional representative coronary artery 
segments spanning the spectrum of progressive coronary artery disease 
(non-atherosclerotic coronary, early ‘fatty streak, inwardly remodelled 
plaque, and advanced ulcerated lesion with necrotic core) confirm that 
CD47 is progressively upregulated during the development of coronary 
artery disease. The tunica media is indicated by dotted lines. f, Additional 
staining in human carotid artery sections confirms that CD47 expression 
is upregulated in atherosclerosis relative to healthy tissue, and appears 
most pronounced within the necrotic core. g, High magnification (100 x) 
imaging confirms that the CD47 expression is specific to lesional cells, 
including SMCs (a-SMA), macrophages (CD68) and cells undergoing 
programmed cell death (Casp3). Comparisons made by two-tailed t-tests. 
**P < 0.01, *P< 0.05. Original magnification, x 40 (d), x4 (f, g). 
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Extended Data Figure 2 | CD47 expression is increased in mouse 
models of atherosclerosis. a, Mice injected with biotin-labelled 
anti-CD47 antibodies reveal that these antibodies accumulate in 

the vasculature of atherosclerotic mice (middle), relative to non- 
atherosclerotic control mice (left). No staining is detected in Cd47~~ mice 
(right), indicating specificity of the antibody. b, Western blotting of tissue 
homogenates obtained from wild-type and Cd47~/~ mice (with and 
without quenching CD47 peptide) further confirms the specificity of the 
antibody. For gel source data, see Supplementary Fig. 1. c, High resolution 
immunofluorescence staining of murine atherosclerotic plaques 

indicate that CD47 is specifically expressed on the surface of lesional 


cells, rather than extracellular debris. Original magnification, x 40. 

d, Publically available microarray data from laser capture microdissected 
(LCM) vascular tissue reveals that CD47 expression is increased within 
the macrophage and foam-cell-rich area of human plaque, relative to 
macrophage and foam-cell-poor areas (GSE23303). e, Similar results 
were observed in LCM tissue from mouse atherosclerotic plaque tissue, 
relative to non-atherosclerotic medial and adventitial tissue (GSE21419). 
f, g, Additional results from the Gene Expression Omnibus (GEO) 
database reveal that aortic CD47 expression is upregulated in murine 
models of atherosclerosis, as observed in the current study (GSE2372 and 
GSE19286). **P < 0.03, *P < 0.05. 
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Extended Data Figure 3 | See next page for caption. 
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Extended Data Figure 3 | Anti-CD47 antibody reduces atherosclerotic 
burden in several orthogonal in vivo models. a, Study timeline 
detailing osmotic minipump implantation and high-fat feeding to 
induce atherosclerosis in the apoE ~/~ and ‘angiotensin infusion’ model 
used herein. Kaplan-Meier curves indicate no change in mortality 

with anti-CD47 treatment during 28 days of follow up. b, c, Additional 
representative examples confirm that anti-CD47 antibody reduces 
atherosclerosis content in the aortic sinus (b) and reduces the percentage 
of the en face aorta covered by atherosclerotic plaque (c). d-g, Several 
additional atherosclerosis models were also used in this study to confirm 
the beneficial effects of anti-CD47 antibody therapy, and to model 
additional aspects of human cardiovascular disease, including a ‘chronic 
atherosclerosis’ model, where antibody therapy was given for 12 weeks 
(with no angiotensin infusion) (d); a ‘plaque vulnerability model, where 


the impact of antibody therapy on plaque rupture and intraplaque 
haemorrhage was quantified (e); an ‘established disease’ model, where 
therapy was given for 7 weeks after mice had already developed advanced 
plaques of equivalent size (f); and a ‘reduced dose’ model, where the dose 
of anti-CD47 antibody was reduced by 75%, relative to the preceding 
studies (g). h, Additionally, a ‘short term’ study was performed where mice 
with established lesions of equivalent size and identical apoptosis rates 
were pulsed for only 5 days with anti-CD47 antibodies before collection, 
to quantify the effect of therapy on efferocytosis rates, independent of 
lesion size (phagocytic index indicated by the ratio of ‘free’ (white stars) to 
‘associated’ (white arrows) apoptotic bodies). Additional methodological 
details are provided in the Methods. Comparisons made by two-tailed 
t-tests. **P < 0.03, *P < 0.05. Error bars represent the s.e.m. Original 
magnification, x4 (b, d-f), x2 (c), x10 (h). 
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Extended Data Figure 4 | Anti-CD47 antibody promotes the 
phagocytosis of diseased SMCs and macrophages, without altering 
apoptosis. a, In vitro caspase activity assays reveal that anti-CD47 
antibody does not alter rates of programmed cell death in any vascular 
cell type. b, Flow cytometry assays confirm that anti-CD47 antibody has 
no effect on apoptosis at baseline, or in vascular SMCs exposed to 24 or 
72h of oxLDL. c, Staining controls for the in vitro phagocytosis assays. 
d, Representative FACS plots for the in vitro efferocytosis conditions 


IgG — anti-CD47 anti-CD8 


displayed in Fig. 2e. The right upper quadrant (highlighted in red) 
includes double-positive cells that are taken to represent a macrophage 
that has ingested a target cell. e, In vitro efferocytosis assays using lipid- 
loaded macrophages as the target cell confirm that anti-CD47 antibody 
also stimulates the clearance of this vascular cell type, similar to the 
findings observed with SMCs. f, Additional in vitro efferocytosis assays 
confirm that anti-CD47 antibody stimulates phagocytosis of vascular cells 
in a specific manner. Error bars represent the s.e.m. 
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Extended Data Figure 5 | Additional examples confirm the pro- 
efferocytic properties of anti-CD47 antibody in vivo. a, Additional 
representative images detail that mice treated with anti-CD47 antibodies 
have a lower overall burden of apoptotic debris (caspase in green), as well 
as fewer examples of ‘free’ apoptotic bodies (white stars). Those apoptotic 
bodies that are present in these lesions are more often found in close 
proximity to macrophages (Mac-3 in red) and are considered ‘associated’ 
with a phagocyte if physically co-localized (white arrows). b, Additional 
electron microscopy examples provide further qualitative evidence 


a-SMA TER-119 


that phagocytes present in the lesions of mice treated with anti-CD47 
antibodies are more likely to have ingested several apoptotic bodies (white 
arrows) compared to lesions from IgG treated mice which are more likely 
to have a high burden of ‘free’ apoptotic bodies (yellow arrows). 

c-e, Additional representative examples of the necrotic core analysis (c), 
the phospho-SHP1 staining (d), and the plaque haemorrhage analysis (e) 
are shown, as described in the Methods. Original magnification, 

x10 (a, d), x4 (¢, e). 
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Extended Data Figure 6 | Full-dose anti-CD47 antibody therapy induces 
anaemia, but does not appear to alter NO elaboration, TSP1-dependent 
signalling, or other processes relevant to vascular biology. 

a, No significant change in blood pressure is observed between mice 
treated with IgG or anti-CD47 antibodies, arguing against a systemic 
difference in nitric oxide (NO) production due to antibody therapy. 

b, Direct measurement of pulmonary NO release via the Griess reaction 
indicates that anti-CD47 antibody does not increase NO elaboration 

in vivo. c, Western blot analysis of cultured murine vascular cells reveals 
that anti-CD47 antibody has no effect on the expected induction of p38 
and ERK phosphorylation secondary to TSP1 treatment. d, Similarly, 
anti-CD47 antibody has no effect on TSP1-dependent inhibition of 

eNOS phosphorylation, nor acetylcholine-dependent induction of eNOS 
phosphorylation. e, MTT assays show that anti-CD47 antibody does 

not affect cellular proliferation rates in the presence of TSP1. f, In vitro 
efferocytosis assays show that the expected basal increase in phagocytosis 
observed after apoptotic cells are exposed to TSP1 (black bars) is not 
altered in the presence of anti-CD47 antibodies (red bars). (g). Compared 
to mice receiving control IgG, mice receiving anti-CD47 antibody 
treatment have similar body weights at baseline and at time of killing. 

h, No difference is observed for the weight of any organ between groups, 
with the exception of splenomegaly observed in the anti-CD47-treated 
animals. i, Histological analysis of the explanted splenic tissue reveals an 
increase in the red pulp of anti-CD47 treated mice without any change in 
fibrosis or white pulp content, suggestive of increased erythrophagocytosis 


in this reticuloendothelial organ. j-1, Dot plots detail the haemoglobin 
count (j), reticulocyte count (Kk) and circulating monocyte count (1) for 
each animal in the acute 4-week angiotensin-infusion atherosclerosis 
model. Note that this anaemia appears to be self-limited, and no anaemia 
was observed in the chronic atherosclerosis model or the reduced dose 
model (P= 0.54 and 0.57, respectively). m, mRNA analysis of aortic tissue 
reveals that anti-CD47 antibody has no significant effect on the expression 
of macrophage-polarization factors in vivo. n, Anti-CD47 antibody also 
has no effect on the aortic expression of any other candidate efferocytosis 
genes. o-r, Additional quantitative analyses reveals that anti-CD47 
antibody has no effect on in vivo neutrophil content (as assessed by Ly6G* 
area normalized to lesion size) (0); macrophage content (as assessed by 
Mac-3* area normalized to lesion size) (p); T-cell content (as assessed 

by CD3* area across the lesion and adventitia) (q); or smooth muscle 

cell content (as quantified by a-SMA* area in the aortic sinus from the 
external elastic lamina to the lumen) (r). s, t, Anti-CD47 antibody also 
had no effect on lipid level (s) or serum insulin (t). u, MTT assays reveal 
that anti-CD47 antibody has no effect on the proliferation of primary 
aortic SMCs obtained from apoE ~’~ mice either at baseline (left) or in 

the presence of 10% serum (right). Comparisons made by two-tailed 
t-tests, unless otherwise specified. ***P < 0.001, **P< 0.01, *P<0.05. 
Error bars represent the s.e.m. For gel source data, see Supplementary 

Fig. 1; for detailed serological data, see Extended Data Table 1. Original 
magnification, x4. 
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Extended Data Figure 7 | Additional bioinformatic and experimental 
analyses further implicate a central role for TNF-c in vascular CD47 
signalling. a, Cytoscape network visualization of the genes which are 
significantly correlated with CD47 expression in both human and 
murine atherosclerotic plaque reveals a high number of TNF-a-related 
factors (indicated in blue), including ligands, receptors, and downstream 
signalling factors. b, PANTHER pathway analysis of those genes which 
were significantly associated with CD47 expression in mouse and human 
vascular tissue and have been previously associated with atherosclerosis 
through the STAGE study”, identifies ‘inflammation mediated by 
chemokine and cytokine signaling pathway’ as the most over-abundant 
pathway associated with CD47 expression in vascular tissue. c, Using the 
Hybrid Mouse Diversity Panel (HMDP), which correlates aortic gene 
expression with Luminex cytokine array data of plasma samples from 
over 100 inbred strains of mice, we found that vascular CD47 expression 
is positively correlated with three inflammatory cytokines in vivo, 
including TNF-a, IL-2 and CXCLI. Correlation data shown for CD47 
and TNF-a. d, Co-expression studies confirm that TNF-a and CD47 
expression are positively correlated in human carotid endarterectomy 
samples from the BiKE validation study. The Pearson correlation 
coefficient was determined assuming a Gaussian distribution and 

P values were determined using a two-tailed test. e, Experiments with 
primarily cultured mouse aortic SMCs indicate that TNF-a reproducibly 
induces Cd47 mRNA upregulation, whereas a number of other classical 
pro-atherosclerotic stimuli have no significant effect. Notably, CXCL1, 
IL4, TGF-$ and IL-2 fail to induce CD47 expression in vitro, as assessed 
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by ANOVA. f, Additional studies suggest that the effect of TNF-a on 
CD47 expression persists in the presence of oxLDL, as occurs in the 
atherosclerotic plaque. g, Western blotting confirms that TNF-a induces 
CD47 expression in vascular cells at the protein level. For gel source data, 
see Supplementary Fig. 1. h, Immunocytochemistry studies of HCASMCs 
confirm that CD47 expression is induced on the cell surface of TNF-a 
treated cells. TNF-a effect is assessed by co-staining for HMGB1, and 
antibody specificity is confirmed with isotype control and recombinant 
CD47 peptide quenching assays. i, Multiple assays (including FACS, 
TaqMan and immunocytochemistry studies) reveal that CD47 expression 
is downregulated on vascular SMCs during programmed cell death, as has 
previously been observed with inflammatory cells. j, Confirmatory assays 
in cultured human coronary artery SMCs reveal that TNF-a induces 
changes similar to those observed in murine cells (Fig. 3d), including an 
induction of CD47 under physiological conditions and a blunting of its 
expected downregulation during apoptosis. Original magnification, x20. 
k, The capacity of TNF-a to impair CD47 downregulation during 
programmed cell death is also observed in mouse SMCs simultaneously 
exposed to pro-apoptotic stimuli and oxLDL. I, No correlation between 
CD47 and other candidate cytokines was observed in the BiKE biobank, 
further supporting a specific relationship between CD47 and TNF-a. 

m, Representative FACS-based apoptosis panels from cells exposed to the 
conditions used in Fig. 3g confirm that TNF-a suppresses efferocytosis 
(Fig. 3g) despite increasing programmed cell death. Comparisons made 
by two-tailed t-tests, unless otherwise specified. ***P < 0.001, *P < 0.05. 
Error bars represent the s.e.m. 
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Extended Data Figure 8 | The CD47 promoter contains predicted 
binding sites for the TNF-c-related transcription factor NF-KB1. 

a, UCSC genome browser screenshot showing overlay of human 

CD47 transcript with ENCODE transcription factor binding sites 
(including RELA, E2F4, and SRF), along with the active H3K27ac 
histone modification ChIP-seq track, and a custom track for chromatin 
accessibility in HCASMCs using the assay for transposase accessible 
chromatin followed by sequencing (ATAC-seq). These chromatin, 
DNase hypersensitivity sites, and published ChIP-seq data suggest that 
members of the NF-kB family of transcription factors could regulate 
CD47 expression in vascular tissue. b, Additional co-expression studies 
in the BiKE validation study confirm that NFKB1 and CD47 expression 
are positively correlated in human carotid endarterectomy samples. The 
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Pearson correlation coefficient was determined assuming a Gaussian 
distribution and P values were determined using a two-tailed test. 

c, Additional luciferase promoter reporter assays reveal that induction 
of CD47 expression requires the presence of NF-kB1 and cannot be 
induced by other NF-kB co-factors such as RELA or c-REL. d, e, Time- 
course studies confirm that CD47 expression is induced by TNF-a 
within 24h, suggesting a direct transcriptional relationship (TaqMan 
mRNA expression assays (d); luciferase reporter assays (e)). f, Additional 
chromatin immunoprecipitation studies confirm that NF-KB1 protein 
binds the CD47 promoter within 90 min of TNF-a-treatment in human 
coronary artery SMCs. **P < 0.01, *P < 0.05. Error bars represent the 
s.e.m. 
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Extended Data Figure 9 | Dual inhibition of CD47 and TNF-a provides 
a combinatorial effect. a, Pretreatment of mouse vascular SMCs with 

a chemical inhibitor (SPD 304) or a monoclonal antibody (infliximab) 
directed against TNF-« prevents the increase in CD47 expression 
normally seen after TNF-a exposure. b, Similar effects were observed with 
the NF-«B inhibitor, BAY 11-7085, confirming the molecular mechanism 
outlined in Fig. 4. c, Mice injected for four weeks with the decoy TNF-a 
receptor, etanercept, display a significant reduction in their in vivo 
expression of CD47 in splenic (left) and renal (right) tissue. d, Publically 
available microarray data from human clinical trials of commercially 
available TNF-a inhibitors reveal that subjects treated with these agents 
also express lower levels of CD47 in vivo (as assessed by two-tailed 
t-tests), confirming the mouse findings above (GSE accession numbers 
from left to right: 16,879 (n= 85), 12,251 (n= 22), 47,751 (n= 28) and 
41,663 (n= 66)). e, f, Additional in vitro efferocytosis assays confirm 

a synergistic effect of anti-CD47 antibodies with a variety of TNF-a 
inhibitors in both the absence (e) and presence (f) of exogenous TNF-a. 
g, Mice with established plaques of identical size and with equivalent rates 
of apoptosis were treated with a short course (5 days) of IgG, anti-CD47 


antibodies, etanercept, or combination therapy before collection. As 
shown the phagocytic index (indicated by the ratio of ‘free’ (white stars) to 
‘associated’ (white arrows) apoptotic bodies) displayed a non-significant 
trend towards improvement for combination therapy (P > 0.05). 

h, When treated for a full 28 days in the angiotensin-infusion model, 
individual comparisons showed that etanercept alone had no effect on 
atherosclerosis, and combination therapy was not significantly 

different from anti-CD47 alone, probably due to the potent effect of 
anti-CD47 monotherapy. ANOVA post-hoc test analysis did identify 

a significant linear trend across all four groups (P for trend <0.01). 

i, Electron microscopy provides additional qualitative evidence that 
combination therapy may provide an incremental effect on efferocytosis, 
as suggested by an increased prevalence of macrophages within the plaque 
which had ingested a large number of apoptotic bodies (white arrows), 

a reduced prevalence of free apoptotic bodies (yellow arrows), and a 
reduced prevalence of uncleared cells undergoing secondary necrosis 

(red arrows). ***P < 0.001, **P< 0.01, *P< 0.05. Error bars represent 
the s.e.m. 
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Extended Data Table 1 | In vivo serological data and additional in silico and bioinformatic data 
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a, Complete serological studies (including blood count, liver function studies, basic metabolic panel and fasting glucose) from the 4-week apoE~’--Angll atherosclerosis model indicate that anti-CD47 


antibody induces a significant reduction in haemoglobin and compensatory reticulocytosis, consistent with previous reports*’. The erythrophagocytosis of senescent red blood cells appears to be 
self-limited, and no anaemia was observed in the chronic atherosclerosis model or the reduced dose model (P=0.54 and 0.57, respectively). No significant difference in any other serum marker is 
observed except for an increase in serum creatinine, which does not deviate outside of the reference range. Metabolic parameters and leukocyte differential data from the 12-week chronic 
atherosclerosis model are displayed at the bottom of the table. b, Additional Upstream Regulator Analysis (URA) bioinformatic analyses of the cytoscape data displayed in Extended Data Fig. 7a 
performed within the Ingenuity Pathway Analysis (IPA) software identifies a number of TNF-a-related factors (indicated in red) which are predicted to mediate transcriptional regulatory roles in the 


gene network shown in that panel. P values were determined by Fisher’s exact test by comparing overlap of co-expressed genes with known upstream regulators from the Ingenuity Knowledge Base. 


c, Several additional DAVID-based bioinformatics analyses including (KEGG, SMART, PANTHER and GO analyses) confirm the association between CD47 and inflammatory signalling related to the 
TNF-a pathway (indicated in red). Blue panels indicate the —logio(P value) for each identified factor. d, Transcription factor binding site prediction algorithms identify several putative NF-B family 


binding sites within the CD47 promoter, as displayed in Extended Data Fig. 8a. e, List of primers used in this study. 
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Neoantigen landscape dynamics during human 
melanoma-T cell interactions 


Els M. E. Verdegaal', Noel F. C. C. de Miranda2, Marten Visser!, Tom Harryvan!, Marit M. van Buuren*+, Rikke S. Andersen‘+, 
Sine R. Hadrup‘*t, Caroline E. van der Minne!, Remko Schotte®, Hergen Spits®°, John B. A. G. Haanen?’, Ellen H. W. Kapiteijn!, 


Ton N. Schumacher? & Sjoerd H. van der Burg! 


Recognition of neoantigens that are formed as a consequence 
of DNA damage is likely to form a major driving force behind 
the clinical activity of cancer immunotherapies such as T-cell 
checkpoint blockade and adoptive T-cell therapy!~’. Therefore, 
strategies to selectively enhance T-cell reactivity against genetically 
defined neoantigens!*""! are currently under development. In 
mouse models, T-cell pressure can sculpt the antigenicity of 
tumours, resulting in the emergence of tumours that lack defined 
mutant antigens!”!?, However, whether the T-cell-recognized 
neoantigen repertoire in human cancers is constant over time is 
unclear. Here we analyse the stability of neoantigen-specific T-cell 
responses and the antigens they recognize in two patients with 
stage IV melanoma treated by adoptive T-cell transfer. The T-cell- 
recognized neoantigens can be selectively lost from the tumour cell 
population, either by overall reduced expression of the genes or loss 
of the mutant alleles. Notably, loss of expression of T-cell-recognized 
neoantigens was accompanied by development of neoantigen- 
specific T-cell reactivity in tumour-infiltrating lymphocytes. 
These data demonstrate the dynamic interactions between cancer 
cells and T cells, which suggest that T cells mediate neoantigen 
immunoediting, and indicate that the therapeutic induction of 
broad neoantigen-specific T-cell responses should be used to avoid 
tumour resistance. 

Cytotoxic CD8* T cells can specifically recognize and eliminate 
tumours'*"'*. The clinical activity of antibodies against the T-cell 
checkpoints CTLA-4 and PD-1 underscores the notion that in many 
patients a tumour-specific T-cell response is present'”. Similarly, 
the clinical effects of autologous adoptive cell transfer (ACT) rely 
on the presence of tumour-reactive T-cell populations!**?°. Cancer 
genome-screening strategies to dissect T-cell responses in these treated 
patients provide strong evidence for a role of neoantigen recognition. 
Neoantigen-specific CD4* and CD8* T-cell responses are frequently 
observed in melanoma and gastrointestinal tract tumours*>!!!, T-cell 
responses against neoantigens can be enhanced by CTLA-4 and PD-1 
blockade*’, and mutational load correlates with response to checkpoint 
blockade in non-small cell lung cancer, melanoma, and tumours with 
mismatch repair deficiency**®. Finally, a recent case report provides 
direct evidence for the clinical potential of neoantigen-specific T-cell 
reactivity’. 

These data, and the fully tumour-restricted expression of neoantigens, 
makes them an attractive target for novel cancer immunotherapies. In 
support of this concept, neoantigen vaccines were effective in mouse 
tumour models and immunogenic when injected in humans’*’. 
Neoantigen-specific T-cell transfer strategies are also being developed”. 

However, loss of a defined neoantigen resulted in tumour cell 
resistance in a transplantable tumour model!?. Whether loss of 


neoantigens also occurs in human disease is unclear. Therefore, we 
analysed the changes in expression and recognition of neoantigens 
targeted by tumour-specific T cells during the follow-up of two 
patients with stage IV melanoma treated with ACT. The first patient 
(patient BO) presented with stage IV melanoma with multiple (>9) 
subcutaneous, lymph node and lung lesions. One subcutaneous lesion 
was resected to establish melanoma cell line MEL05.18. One year later, 
the disease progressed and the patient developed a brain metastasis 
that was partially resected and used to establish both a melanoma 
line (MEL06.07) and tumour-infiltrating lymphocytes (TIL06.07). 
At this point, ACT treatment was initiated, using a tumour-specific 
T-cell product expanded by repeated stimulation of peripheral blood 
mononuclear cells (PBMC) with MEL05.18. This treatment led to 
a complete response that has been ongoing for more than 9 years 
(Fig. la). 

The infusion product comprised 70% of CD8* T cells. Stimulation 
with MEL05.18 showed that >80% of the CD8* and >15% of the 
CD4* T cells were tumour reactive’*, Analysis of CD8* T-cell reactivity 
using a panel of >200 major histocompatibility complex (MHC) 
multimers containing known shared tumour-associated epitopes!*”4 
demonstrated that only a minor fraction of tumour-reactivity could be 
explained by recognition of these shared antigens (1.24% of the CD8* 
T cells responded to three different gp100 epitopes; data not shown). 
The dominant recognition of unique patient-specific (private) antigens 
was also suggested by stimulation of the T cells used for infusion 
with a panel of (partially) human leukocyte antigen (HLA)-matched 
melanoma lines, showing exclusive recognition of the autologous 
tumour (Fig. 1b). 

To identify the antigens recognized, we analysed MEL05.18 for 
non-synonymous somatic mutations within expressed genes and 
found that MEL05.18 expressed 501 non-synonymous mutated genes. 
Autologous antigen-presenting cells”> loaded with synthetic 31-mer 
peptides covering these mutations were used to asses T-cell recognition. 

In the infusion product used to treat patient BO, we observed CD4* 
T-cell responses against RPS12(V 1041), ZC3H18(G269R) (Fig. 1c 
and ref. 3) and TNIK(S502F)°, and CD8* T-cell responses against 
KIAA0020(P451L) (KIA(P451L)) and ribosomal protein RPL28(S76F) 
(Fig. 1d). The combined frequency of CD8* T cells that recognize these 
two mutant epitopes equalled the total tumour-reactive CD8* T cell 
pool in the infusion product (Fig. 1d and le). 

To address the stability of the neoantigen repertoire recognized by 
CD8* T cells, we compared neoantigen reactivity in peripheral blood 
and resected tumour tissue. CD8* T cell responses against both of the 
identified neoepitopes were detectable ex vivo in the blood of patient 
BO that was sampled before ACT. Their frequency increased after 
ACT (Fig. 1f) and eventually contracted following complete tumour 
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Figure 1 | Identification and dynamics of neoepitope-specific T-cell 
reactivity and neoepitope expression in patient BO. a, Clinical course, 
tissue and cell line collection. CR, complete response; PD, progressive 
disease. b, T-cell reactivity against autologous (asterisk) and a panel 

of (partially) HLA class I and II matched melanoma cell lines by IFNy 
release. The mean + s.d. of a representative experiment performed in 
duplicate is depicted. c-e, Neoantigen-specific CD137* cell-frequency 
among CD4t (c) or CD8* T cells (d). Autologous tumour cells and B 
cells served as positive and negative controls for CD4* (white bars) and 
CD8* (black bars) T cells (e). The mean +s.d. of at least two independent 
experiments is shown. Asterisk indicates significant increase above 
negative control (P< 0.05; Student's t-test). f, Percentage of IFNy 
-producing KIA(P451L)-specific (middle) and RPL28(S76F)-specific 
(right) CD8* cells in peripheral blood before and 3 weeks after ACT. 


regression (not shown). Surprisingly, although the TIL06.07 culture 
isolated from the brain tumour resected just before ACT comprised 
almost exclusively CD8* T cells, T-cell reactivity was only observed 
against the KIA(P451L) neoantigen, and not against RPL28(S76F) 
(Fig. 1g). Whereas the mutation and expression of the KIA0020 gene 
(also known as PUM3) was present in both cell lines (Fig. li and j), 
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g, TIL06.07 reactivity against the identified neoantigens, MEL05.18 

and MEL06.07. h, Reactivity of enriched RPL28(S76F)-specific T cells 
against neoantigens, MEL05.18 and MEL06.07. Autologous B cells served 
as negative controls (f-h). Mean + s.d. of a representative experiment 
performed in triplicate is shown (g, h). Asterisks indicate significant 
increase above negative control (P < 0.01; Student’s f-test). i, Sanger 
sequencing of the indicated genes in cell lines and tissue. The arrows 
indicate the non-synonymous mutations resulting in the identified 
neoantigens. WT, wild type. j, Fold change in RNA expression of KIA0020 
in MEL06.07 relative to MEL05.18, mean +s.d. of two independent 
experiments is shown. k, Allele-specific expression of the wild-type (blue) 
and mutant (red) allele of the RPL28 gene. Grey dashed line depicts cut- 
off value for specific expression. l, Loss of heterozygosity (LOH) near the 
RPL28 gene in both MEL06.07 and tumour tissue 06.07. 


absence of RPL28(S76F)-specific T cells coincided with absence 
of the mutant allele encoding the RPL28(S76F) neoantigen within 
the MEL06.07 cell line (Fig. 1i and k). To assess if the cell lines were 
representative of the corresponding tumour samples, Sanger sequencing 
of the RPL28 mutation was performed in archival formalin-fixed tissues 
and was consistent throughout (Fig. li). Loss of heterozygosity (LOH) 
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analysis by amplification of polymorphic markers around the RPL28 
gene confirmed loss of the mutant RPL28 allele in tumour tissue 06.07, 
whereas both alleles were present in early tumour tissue 05.18 and 
normal tissue (Fig. 11). The presence of a single detectable RPL28 allele 
in 06.07 tumour tissue indicates the clonal outgrowth of a tumour 
subclone with LOH at this locus, and is consistent with absence of the 
RPL28(S76F) mutation within the MEL06.07 cell line and 06.07 tissue 
(Fig. 11). In line with these findings, T cells enriched for RPL28(S76F) 
reactivity responded to MEL05.18 but not to the RPL28(S76F) loss 
variant MEL06.07 (Fig. 1h). 

The loss of a neoantigen recognized by T cells in a sequential tumour 
lesion is consistent with the concept of immunoediting. Furthermore, 
the data suggest that accumulation of neoantigen-specific T cells at 
the tumour site reflects local antigen presence. To address this, we 
investigated neoantigen expression and reactivity in sequential lesions 
from a second patient. Patient AB was surgically treated for metastatic 
melanoma that spread to the axillary lymph nodes. From the resected 
tumour tissue, melanoma line MEL04.01 was established and used to 
expand tumour-specific T cells from the peripheral blood. Two years 
later, the patient developed a progressive liver lesion that was treated 
by ACT, resulting in disease stabilization for 3 months according 
to response evaluation criteria in solid tumours (RECIST1.1) and 
6 months according to immune-related response criteria (iRC)*®. 
Subsequently, the liver lesion was resected and used to generate 
TIL08.10. The patient remained disease-free for a period of 4 years 
and then developed multiple subcutaneous, lung and bone lesions. Two 
subcutaneous lesions, 12.07 and 12.09, were resected for the isolation 
of TIL12.07 and TIL12.09, respectively, and to establish melanoma line 
MEL12.07 (Fig. 2a). 

The infusion product of patient AB comprised 70% of CD8* T cells, 
and 54% of these CD8* T cells were reactive against MEL04.01 used 
to generate the infusion product, whereas no reactivity was observed 
within the CD4* T-cell fraction**. Again, dominant recognition of 
private antigens by infused T cells was suggested by the limited CD8* 
T-cell reactivity against shared antigens'*** (<0.04% of CD8* T cells 
infused into patient AB recognized a shared PRDX5 or EPHA2 epitope; 
data not shown) and exclusive recognition of the autologous tumour 
within a panel of (partially) HLA-matched melanoma lines (Fig. 2b). 

MEL04.01 of patient AB displayed 226 expressed non-synonymous 
mutations and the infusion product used for ACT comprised CD8* T 
cells specific for the neoantigens EML1(R64W), SEPT2(R300C), and 
CAD(R1854Q) (Fig. 2c). In all cases, no or only weak recognition of 
the corresponding wild-type peptides was observed (data not shown). 

Independent TIL cultures from tumour lesions of patient AB, 
resected before and after ACT, were screened for recognition of all 
putative neoepitopes present within the original MEL04.01 tumour 
(Fig. 2d-g). Interestingly, TILs obtained before ACT (TIL04.01) did 
not react against any of the three identified neoantigens and showed 
only weak reactivity against the MEL04.01 tumour from which they 
were derived. To assess the fate of the infused neoantigen-specific 
T-cell populations and the antigens they recognize, TIL08.10 obtained 
two years after therapy, and TIL12.07 and TIL12.09 obtained six years 
after therapy were analysed. A low-level T-cell response against the 
EML(R64W) neoantigen was detectable in all three TIL preparations, 
although not significantly above background in TIL12.07. Notably, 
within the 12.07 and 12.09 TIL preparations, no significant T-cell 
reactivity was detectable against the CAD(R1854Q) neoantigen, and 
reactivity against the SEPT2(R300C) neoantigen was absent from all 
post-ACT TIL preparations (Fig. 2e-g). Thus, in this patient immunity 
towards defined CD8* T-cell neoantigens was observed after ACT, 
which was not detected in TIL before therapy, and which was again 
largely absent from subsequent lesions. 

To address whether the observed loss of defined neoantigen-specific 
T cells over time could be compensated by acquisition of novel T-cell 
responses, TIL08.10, TIL12.07 and TIL12.09 were analysed for 
recognition of additional neoantigens that were encoded by mutations 
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Figure 2 | Identification and dynamics of the dominating intratumoral 
neoantigen-specific T-cell repertoire in patient AB. a, Clinical course, 
tissue and cell line collection. b-g, T-cell reactivity measured by IFNy 
release. b, Reactivity of ACT product against autologous (asterisk) and 

a panel of (partially) HLA class I and II matched melanoma cell lines. 
Mean +s.d. of a representative experiment performed in duplicate is 
depicted. c, Reactivity of ACT product against indicated neoepitope- 
loaded autologous B cells, autologous tumour cells, and unloaded 
autologous B cells (negative control). d~g, IFNy release by TIL04.01 
obtained before ACT (d), TIL08.10 obtained 2 years after ACT (e) and 
TIL12.07 (f) and TIL12.09 (g) at disease progression, stimulated with 
the indicated neoantigen-loaded B cells, MEL04.01 and MEL12.07. 

The mean +s.d. of at least two independent experiments is shown and 
significant increase above unloaded B cells as negative control (P < 0.05; 
Student's t-test) is depicted by an asterisk (c—g). h, i, T cells selected 

for SEPT2(R300C) reactivity specifically recognize SEPT2(R300C) 

and MEL014.01 but not MEL12.07, as indicated by the percentage 

of CD137*+CD8* cells. Unloaded B cells (h) or medium (i) served as 
background controls. A representative of two independent experiments 
is shown. 


present in the original tumour MEL04.01. This revealed a low-level 
T-cell response towards the programmed cell death protein 10 
(PDCD10(P28S)) neoantigen in TIL08.10 (Fig. 2e) and a pronounced 
T-cell response towards this neoantigen in both TIL12.07 and TIL12.09 
(Fig. 2f and g, respectively). 

To assess whether absence of detectable SEPT2(R300C), and 
CAD(R1854Q) neoantigen-specific T-cell reactivity in TIL derived 
from subsequently resected post-ACT lesions coincided with altered 
neoantigen expression, we compared the cell lines derived from the 
original MEL04.01 tumour and the recurrent lesion MEL12.07 isolated 
eight years later. HLA genotyping and flow cytometry revealed that 
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Figure 3 | Mutation analysis and neoantigen expression in sequential 
tumour lines and tumour tissue of patient AB. DNA sequencing 
(Sanger), allele-specific expression (KASP analysis) and RNA expression 
(qPCR) of cell lines derived from patient AB (MEL04.01 and MEL12.07). 
a, The arrows indicate the non-synonymous mutations resulting in the 
identified neoantigens. b, Allele-specific expression of the wild-type (blue) 
or mutant (red) alleles. Grey dashed line depicts cut-off value for specific 


both cell lines showed comparable HLA cell-surface expression and 
no loss of HLA alleles (data not shown). Notably, whereas DNA 
sequencing revealed that the mutation in the CAD gene was still present 
in cells derived from the recurring lesion (Fig. 3a), RNA expression 
analysis demonstrated an approximately tenfold reduction in expres- 
sion of this gene (Fig. 3c). Furthermore, whereas RNA expression of 
the SEPT2 gene was not significantly altered in the cell line derived 
from the recurring lesion (not shown), the mutant allele encoding the 
SEPT2(R300C) neoantigen that was present in the original tumour, 
was selectively lost in the tumour cell line derived from tumour 
tissue isolated eight years later (Fig. 3a, b). In agreement with these 
findings, selected SEPT2(R300C)-specific T cells (Fig. 2h) responded 
to MEL04.01 but not to SEPT2(R300C)-negative MEL12.07 tumour 
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expression. c, The fold change in RNA expression of the indicated genes 
in MEL12.07 versus MEL04.01. The mean + s.d. of three independent 
experiments is shown. d, Detection of SEPT2“?5" mutation in MEL04.01 
and respective tumour sample but not in normal tissue or tumour 
samples 08.10 or 12.09. Collapsed sequencing reads are displayed. In grey, 
nucleotide positions correspond to the reference sequence. Mutations are 
displayed in red (>T) and green (>A). 


cells (Fig. 2i). Next-generation amplicon sequencing revealed that the 
SEPT2@°5" mutation was present in approximately 30% of the sequenc- 
ing reads spanning the mutation in both the 04.01 tumour tissue and 
its corresponding cell line, whereas it was already lost in the liver lesion 
08.10 resected after ACT, as well as in lesion 12.09 (Fig. 3d). Of note, 
the PDCD10°?? and the EML1°!”°F mutations were also detected in 
the 04.01 tumour tissue at frequencies (87% and 36% of sequencing 
reads, respectively) similar to those observed within the 04.01 cell 
line (Fig. 3d). Furthermore, in contrast to the SEPT2@°5" mutation, 
these mutations were detected in a similar proportion of sequencing 
reads in the subsequent clinical samples (Fig. 3d), confirming that 
the different tumour lines were representative for the mutation status 
in the corresponding clinical samples. Notably, whereas the mutant 
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PDCD10 allele was already present in the first tumour line (MEL04.01), 
RNA expression of PDCD10 had increased >40-fold in the recurring 
MEL12.07 lesion (Fig. 3c). Competitive allele-specific PCR (KASP) 
analysis confirmed that all PDCD10 expression was derived from the 
mutant allele (Fig. 3b). Thus, the observed loss of a mutant allele or 
changes of RNA expression in the tumours of patient AB corresponded 
exactly with the observed changes in neoantigen-specific T-cell 
reactivity. 

In depth analysis of the tumour-specific reactivity of PBMC, infusion 
products and TILs revealed neoantigen-specific T-cell reactivity in both 
long-term survivors after ACT. All mutations yielding these T-cell- 
recognized neoantigens were absent in the COSMIC cancer gene census 
except for SEPT2(R300C) that was described once’, suggesting that 
these are passenger mutations that do not contribute to cellular fitness. 
Over time, we observed changes in expression of four out of six detected 
neoantigens. In two cases, the mutant allele was lost from a subsequent 
tumour, in one case expression of the mutant gene was substantially 
reduced, and in one case expression of the mutant gene was substan- 
tially increased. In four out of four cases, these changes in neoantigen 
expression were paralleled by loss or gain of neoantigen-specific T-cell 
reactivity. The current data may be explained both by random clonal 
variation between tumour metastases and by true T-cell-mediated 
selection of antigen-negative variants. Although we cannot distinguish 
between these two hypotheses, support for the latter interpretation is 
provided by the selective recognition of the early but not recurring 
lesions by both RPL28(S76F)-specific and SEPT2(R300C)-specific 
T cells of the two patients. Regardless, the current data demonstrate 
that the interaction between human melanoma and T cells varies 
over time, in a way that is similar to the interaction between the T-cell 
compartment and viral quasispecies”®. We also note that the genetic 
differences and the differences in RNA expression between the 
sequentially obtained tumours characterized here reflect heterogene- 
ity that has been observed within many human tumours”. As such, it 
is plausible that a similar drift of the neoantigen repertoire will also be 
a factor in other human malignancies. Collectively, the data presented 
here demonstrate that under conditions in which a high frequency of 
tumour-specific T cells is present, tumour-cell variants with reduced 
or lost neoantigen expression can emerge, similar to what has been 
observed in mice!. The observation of concurrent acquisition of novel 
T-cell reactivity implies that cancer immunotherapies should aim to 
exploit the adaptive capacity of the immune system in order to maintain 
strong immune surveillance over time. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Data reporting. The experiments were not randomized and the investigators 
were not blinded to allocation during experiments and outcome assessment. No 
statistical methods were used to predetermine sample size. 

Subjects. The study was approved by the Medical Ethical Committee of the Leiden 
University Medical Center and written informed consent was obtained from all 
patients. 

Cell lines and culture. Melanoma cell lines were previously generated from 
resected lesions of patients 05.18 and 04.01 (ref. 23). All other melanoma cell 
lines were also established in the laboratory of Medical Oncology (LUMC, 
Leiden, Netherlands) except for melanoma cell line BLM that was obtained from 
the Netherlands Cancer Institute (Amsterdam, Netherlands), FM3 and FM6 
were provided by P. thor Straten and MZ7.4-mel was obtained from Johannes 
Gutenberg University. Cell lines were authenticated by HLA-genotyping, and 
were mycoplasma negative. All melanoma cell lines were cultured in tumour cell 
medium (that is, DMEM (Life Technologies, Breda, The Netherlands) with 8% 
heat-inactivated fetal calf serum (FCS), penicillin (100 IU ml’), streptomycin 
(100 1g ml!) and L-glutamine (4mM) all from Lonza Biowhittaker (Breda, 
Netherlands)). HLA genotyping of the cell lines was performed by the Department 
of Immunohematology and Blood Bank of the LUMC. Autologous immortalized 
B cells, obtained by transduction with the anti-apoptotic genes Bcl-6 and Bcl-XL”° 
were provided by AIMM therapeutics (Amsterdam, The Netherlands) and were 
cultured in IMDM (Lonza) with 8% heat-inactivated FCS, penicillin (100IU ml"), 
streptomycin (100,.gml-') and L-glutamine (4mM). B cells were stimulated twice 
a week with irradiated (70 Gy) CD40L expressing mouse L-cell fibroblasts and 
murine IL-21 (50ng ml 1, AIMM therapeutics). 

TIL were obtained by culturing small tumour fragments in T-cell medium 
(IMDM with 7.5% heat-inactivated pooled human serum (Sanquin Blood bank, 
Dordrecht, The Netherlands), penicillin (1001U ml”), streptomycin (100,1g ml~ 1) 
and L-glutamine (4mM) supplemented with 1,000IU ml”! recombinant human 
(rh) IL-2 (Aldesleukin, Novartis). TIL medium plus rhIL-2 was refreshed every 
two to three days. Tumour-specific T-cell batches were obtained by MLTC as 
previously described”. Briefly, peripheral blood mononuclear cells were incubated 
with irradiated (100 Gy) autologous tumour cells at a 15:1 effector:target cell ratio 
in T-cell medium supplemented with rhIL-4 (10 U mt}; Cellgenix, Cellgro) to 
prevent expansion of natural killer (NK) cells. From day 2 onward, medium was 
refreshed every two to three days with T-cell medium plus rhIL-2 (1501U ml"). 
T cells were re-stimulated weekly with irradiated autologous tumour cells and 
cultured for a total of 4 weeks before cryopreservation for further experiments 
and ACT treatment”’. 

To test for reactivity of T cells used for ACT against shared antigens a panel of 
(partially) HLA-matched melanoma cell lines was used. At least one cell line with 
a matched allele for every HLA class II allele was tested. For HLA class I alleles, 
3-8 matched cell lines are tested for each allele, except for the heterozygously 
expressed HLA-A*23.01 and HLA-B*49.01 in patient BO, for which no matched 
cell lines were available. 

Neoantigen identification. Whole-exome and RNA sequencing was performed 
and 31-mer synthetic peptides covering the non-synonymous somatic mutations 
were manufactured as previously described’. Importantly, no selection based on 
in silico prediction of MHC binding affinity was used in order not to exclude any 
potential epitopes preemptively. Next, T cells were incubation with target cells that 
is, tumour cells or Bcl-6/Bcl-XL B cells either unloaded or preloaded overnight 
with peptide pools or single peptides (10,.g ml! per peptide). Reactivity of 
T cells was measured after 24-48 h co-incubation with target cells by IFN secretion 
using ELISA (Sanquin) or intracellular cytokine staining for IFNy or expression of 
the activation marker CD137 by FACS analysis, as previously described”*, Where 
indicated, neoepitope-specific T cells were selected after overnight stimulation of 
T cells with specific Bcl-6/Bcl-XL B cells loaded with synthetic 31-mer peptides 
using the CD137 MicroBead kit (Miltenyi Biotech) according to the manufacturer's 
instructions. 

Sanger sequencing of DNA samples. Cell line DNA was extracted with the 
Genomic Wizard DNA purification kit (Promega). DNA from microdissected 
formalin-fixed archival material was extracted with the NucleoSpin FFPE DNA 
kit (Macherey-Nagel) Primers were designed with Primer-BLAST (http://www. 
ncbi.nlm.nih.gov/tools/primer-blast/) (see below). PCR reactions were carried out 
in 15,11 with 10ng of DNA, 1x iQ Supermix (Bio-rad), and 10 pmol of forward 
and reverse primer. PCR products were purified with the NucleoSpin Gel and 
PCR Clean-up kit (Macherey- Nagel) and sequenced in both forward and reverse 
directions at Macrogen Europe (Meibergdreef). 

Real-time PCR gene expression analysis. Total RNA was isolated with the 
TRIzol (ThermoFisher) method including DNase treatment in suspension using 
rDNase (Macherey-Nagel). cDNA was synthesized using 2 1g of RNA, 50 ng of 
p(dT)15 (Roche), 0.3 j1g of Random Primers (ThermoFisher), 1 mM dNTPs 


(ThermoFisher), 411 5x AMV-RT buffer (Roche), 10 units of AMV-RT (Roche) 
and 20 units of RNasin Ribonuclease Inhibitor (Promega). cDNA synthesis was 
carried out at 42°C for 1h. qPCR primers were designed with Primer-BLAST 
(http://www.ncbi.nlm.nih.gov/tools/primer-blast/) (see below). 

qPCR mixes consisted of: 2,11 of 1:25 cDNA solution, 3 pmol of primer (forward 
and reverse), iQ SYBR Green Supermix in 1 11 for PDCD10 and CPSF6 (house- 
keeping gene); 5 1l of 1:25 cDNA solution, 6 pmol of primer (forward and reverse), 
iQ SYBR Green Supermix (Bio-rad). Amplification cycles: 95°C, 10s; 60°C, 30s 
for CPSF6 and PDCD10. 95°C, 10s; 62°C, 30s; 72°C, 30s for EML1, CAD, and 
SEPT2. Melting was performed to confirm the specificity of the assays. The 
AC, method was applied to calculate the levels of gene expression, relative to the 
housekeeping gene. At least two independent measurements were performed to 
assess gene expression and the fold change in expression in the recurrent cell line 
relative to expression in the original cell line was calculated. 

Amplicon next-generation sequencing. Amplicon next-generation sequencing 
was used to confirm that the cell lines were representative for the correspond- 
ing tumour tissue. Two independent primer sets per target were designed (see 
below) to amplify the SEPT2©°5", PDCD10“?", and the EML1°!"T mutations 
in MEL04.01, MEL12.07, and their respective clinical samples. 18-mer, M13 tails 
were added to the 5’-end of the forward (5'-TGTAAAACGACGGCCAGT-3’) and 
reverse (5'-CAGGAAACAGCTATGACC-3') primers. Two multiplex PCRs were 
performed for each sample; each PCR reaction containing one of the primer sets 
for each target. PCRs were performed with the FastStart High Fidelity PCR System 
(Roche). Following the pooling of both reactions and purification with AMPure 
XP beads (Beckman Coulter), a second PCR was performed in order to integrate 
A and P1 adaptor sequences for Ion Torrent and distinct barcodes for each sample. 
Sample libraries were quantified by qPCR, normalized and analysed by the Ion 
PGM system in an Ion 318 chip (ThermoFisher Scientific). More than 10,000 
sequencing reads covering each mutation were generated. 

Allele-specific expression. Single nucleotide polymorphism (SNP) genotyping 
was performed using the competitive allele-specific PCR (KASPar) assay, following 
the manufacturer's protocol (LGC Genomics). The oligonucleotides were designed 
using Primerpicker (KBioscience) (see below). 

After initial denaturation at 94°C for 15 min, 10 cycles consisting of 94°C for 
20s and touchdown from 65°C in 0.8°C steps per cycle were run. Subsequently, 
30 cycles of 94°C for 20s, 57°C for 1 min, and 25°C for 10s followed by fluorescent 
detection were applied. Three independent measurements were performed to 
assess allele-specific expression. The cut-off value for specific expression is set at 
>500 relative fluorescence units (RFU) based on the corresponding positive 
control. 

Loss of heterozygosity. The presence of polymorphic markers around the RPL28 
and SEPT2 genes was investigated with the UCSC genome browser. AFMA357YH1 
and D19S1142, and AFM182YA5 and RH56150, were employed to determine the 
zygosity status around the RPL28 and SEPT2 genes, respectively. Primers were 
designed around these markers (see below) and hexachloro-fluorescein label was 
added to the 5’-end of the forward primer. PCR amplifications were performed with 
10ng of DNA in iQ Supermix (Bio-Rad). PCR products were mixed in a formamide 
solution containing GeneScan 500 TAMRA Size Standard (Applied Biosystems). 
Thereafter, samples were loaded in a 4-capillary 3130 DNA Analyzer and results 
were interpreted with the GeneMapper 4.1 software (Applied Biosystems). The 
only marker that was informative, that is, that presented a heterozygous status in 
the patient’s germline DNA was AFMA357YH1. 

Used primer sequences. For PCR and Sanger sequencing, the following primers were 
used: KIAA0020 forward 5'-GGATGGGTAAGTGGACTTCTGG-3', KIAA0020 
reverse 5'-GAAAGGTCCTGTGGTACTTTGTT-3'; RPL28 forward 5'-GGGC 
TATGAGTGTGGCAGAAG-3', RPL28 reverse 5'-AAAGCAAGAATC 
CATCCCTCTC-3'; CAD forward 5'-CCCTCCAGACACCTGAAAGAC-3', 
CAD reverse 5'-GGGCATGTGAGAAGCTGTGA-3'; SEPT2 forward 5'-AGA 
CATTTCCACGGCCATACT-3', SEPT2 reverse 5'-GCTCAGGGTGA 
CACAATACAGA-3'; PDCD10 forward 5'-TGCCTAACGCACCGATAAGA-3', 
PDCD10 reverse 5'-TGTTCTTTCTTCCTCTTTCACCA-3'; EML1 forward 
5'-TGCAAGTAGCATGGAGGTGA-3', EML1 reverse 5'-CAAGATT 
CTGCCACGAGACCA-3', 

RT-PCR primers: KIAA0020 forward 5'-CTGGAGACGTTCAGCCTACC-3', 
KIAA0020 reverse 5'-TGTGAAGCTCTCCGTCCTTG-3'; PDCD10 forward 
5'-ACCGCAGGGCACTTGAAC-3', PDCD10 reverse 5'-GGTTGGC 
ACTTACGAACACA-3’'; EML1 forward 5'-TGGGTTCCCTCTGCCTGTAA-3’, 
EMLI1 reverse 5'-CCTTCTGGCCACACTCCAAA-3'; CAD forward 
5'-GAGATGACCACGACACCTGAA-3', CAD reverse 5'-GCTCCTCAG 
CTGGCAAA-3'. 

KASPar primers:. RPL28 forward Al 5'-GAAGGTGACCAAGTTC 
ATGCTCAGCGGAAGCCTGCCACCTT-3', RPL28 forward A2 
5'-GAAGGTCGGAGTCAACGGATTAGCGGAAGCCTGCCACCTC-3’, RPL28 
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reverse 5'-GCGTGGCGCGAGCATTCTTGTT-3'; PDCD10 forward Al 
5'-GAAGGTGACCAAGTTCATGCTACTCGTTCTAGCTCATTAAACACAGA 
-3', PDCD10 forward A2 5'-GAAGGTCGGAGTCAACGGATTCTCGTTCTA 
GCTCATTAAACACAGG-3', PDCD10 reverse 5'-CTATGCCCCTCTATGC 
AGTCATGTA-3'; CAD forward Al 5'-GAAGGTGACCAAGTTCATG 
CTGGTCGGAGGCTCGATGGATTT-3', CAD forward A2 5'-GAAGGTC 
GGAGTCAACGGATTGGTCGGAGGCTCGATGGATTC-3', CAD 
reverse 5'-GCTTCCTGATGGCCGCTTCCAT-3'; SEPT2 forward Al 
5'-GAAGGTGACCAAGTTCATGCTCCTCTCTTGAGTCTCTCAGAACA-3', 
SEPT2 forward A2 5'-GAAGGTCGGAGTCAACGGATTCCTCT 
CTTGAGTCTCTCAGAACG-3’', SEPT2 reverse 5'-GGTGACCCAGGACC 
TTCATTATGAA-3'. 

For amplification from tissue: RPL28 forward GGTGCAGGTTAGG 
TGGACTG-3', RPL28 Rv CGGATCATGTGTCTGATGCTG-3’. 

For amplicon-based next-generation sequencing: SEPT2 forward p1 
5'-TGTAAAACGACGGCCAGTCCTCTTCTCTTTCAGCACCCA-3', SEPT2 
reverse p1 5'-CAGGAAACAGCTATGACCTGGCTCAGGGTGACACAATAC-3', 
SEPT2 forward p2 5'-TGTAAAACGACGGCCAGTGCCACCTTGGTG 
CATTCCTC-3', SEPT2 reverse p2 5'-CAGGAAACAGCTATGACCACACAATACA 
GAGAAAGGGGCA-3'; EML1 forward p1 5'-TGTAAAACGACGGCCAGT 
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GACAGACCGCATTGCTTCAC-3', EML1 reverse pl 5'-CAGGAAACAGCT 
ATGACCCCTTTGGTAGGTCCTTTCCTGT-3', EML1 forward p2 
5'-TGTAAAACGACGGCCAGTTCCAGATGCAAGAAGACGACA-3', EML1 
reverse p2 5'-CAGGAAACAGCTATGACCGTTCCTGTGTCACTCAAACGC-3;; 
PDCD10 forward p1 5'-TGTAAAACGACGGCCAGTAGCAGACGAAT 
AAATAAAGCAGGA-3', PDCD10 reverse p1 5'"-CAGGAAACAGCTATGACCTG 
AAGCTGAGACCACATCCAT-3', PDCD10 forward p2 5'-TGTAAAAC 
GACGGCCAGTGCAGGAATTAAAGAATTGCAGAGTT-3', PDCD10 reverse p2 
5'-CAGGAAACAGCTATGACCAGAATGAAGCTGAGACCACATC-3;; 

For loss of heterozygosity. AFMA357YH1 forward (HEX) 5'-CATAGG 
CCCAGCAAGCTCAA-3', AFMA357YH1 reverse 5'-ACGTGTCTTCCTTCG 
TACCC-3'; AFM182YA5 forward (HEX) 5'-TGACACGTGAACAGACTAAGCA-3', 
AFM182YA5 reverse 5'-CAATACGGGAGAGCCAGTTGT-3'; D19S1142 
forward (HEX) 5'-AGGACGGAAAGGCAGAGAAA, D19S1142 reverse 
5'-CCCTCTGATCCTCTTTGCT-3'; RH56150 forward (HEX) 5'-AGAAAG 
GGTCGAGGGACTCA-3', RH56150 reverse 5'-GCCAGTTAATTTGG 
AAGCAGGT-3'. 

Statistics. As indicated in the corresponding figure legends, statistical significance was 
calculated by GraphPad Prism 6.0c software using the unpaired, two-tailed Student's 
t-test and P values <0.05 were considered significant, unless otherwise indicated. 
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eIF3d is an MRNA cap-binding protein that is 
required for specialized translation initiation 


Amy S. Y. Lee?+, Philip J. Kranzusch!**+, Jennifer A. Doudna!?34>° & Jamie H. D. Cate!?4° 


Eukaryotic mRNAs contain a 5/ cap structure that is crucial 
for recruitment of the translation machinery and initiation of 
protein synthesis. mRNA recognition is thought to require direct 
interactions between eukaryotic initiation factor 4E (eIF4E) and 
the mRNA cap. However, translation of numerous capped mRNAs 
remains robust during cellular stress, early development, and cell 
cycle progression’ despite inactivation of elF4E. Here we describe 
a cap-dependent pathway of translation initiation in human cells 
that relies on a previously unknown cap-binding activity of eIF3d, a 
subunit of the 800-kilodalton eIF3 complex. A 1.4A crystal structure 
of the eIF3d cap-binding domain reveals unexpected homology to 
endonucleases involved in RNA turnover, and allows modelling of 
cap recognition by eIF3d. eIF3d makes specific contacts with the cap, 
as exemplified by cap analogue competition, and these interactions 
are essential for assembly of translation initiation complexes on 
eIF3-specialized mRNAs? such as the cell proliferation regulator 
c-Jun (also known as JUN). The c-Jun mRNA further encodes an 
inhibitory RNA element that blocks eIF4E recruitment, thus 
enforcing alternative cap recognition by eIF3d. Our results reveal 
a mechanism of cap-dependent translation that is independent of 
eI F4E, and illustrate how modular RNA elements work together to 
direct specialized forms of translation initiation. 

The rate-limiting step of translation initiation is the recognition 
of the 5’ cap structure by eIF4E**, eIF4E activity is highly regulated 
by extracellular stimuli, predominantly through steric hindrance by 
eIF4E-binding proteins (4E-BPs)°°. The translational efficiencies of 
mRNAs range in sensitivity to 4E-BP inhibition’ ~’, and these differ- 
ences have conventionally been addressed by categorizing translation 
into cap-dependent versus cap-independent pathways!°. However, the 
mechanisms underlying mRNA sensitivity to active eIF4E levels remain 
enigmatic as all cellular mRNAs maintain the same 5’ cap structure!!. 

Recently, we discovered a new translation pathway driven by RNA 
interactions with eIF3 that is used by a subset of cell proliferation 
mRNAs, with the prototype member being the mRNA encoding the 
early response transcription factor c-Jun’. eIF3-specialized transla- 
tion is cap-dependent and requires recruitment of elF3 to an internal 
stem-loop structure in the 5’ untranslated region (UTR). However, 
the translational efficiency of a subset of these mRNAs is unaffected 
by eIF4E inactivation’ °, suggesting that cap recognition may proceed 
by a non-canonical mechanism (Supplementary Table 1). 

To understand how cap recognition occurs during eIF3-specialized 
translation, we examined whether c-Jun mRNA uses the canonical 
eIF4F cap-binding complex during initiation. We programmed 
in vitro translation extracts from human 293T cells with capped and 
polyadenylated c-Jun mRNA, and isolated the 48S complex to assess the 
presence of the eIF4F factors (e[F4G1, e[F4A1 and eIF4E) (Fig. la, b). 
Unexpectedly, although c-Jun mRNA translation initiation complexes 
contain elF3 and the small ribosomal subunit, they are depleted of 


all e[F4F components. By contrast, eIF4F is readily detectable in 48S 
initiation complexes formed on a canonical e[F4E-dependent mRNA, 
ACTB"™ (E ig. 1b). Inagreement with the absence of eIF4F, c-Jun levels are 
unaffected by cell treatment with the mTOR inhibitor INK128 (ref. 7), 
which inactivates eIF4E, or with eIF4A inhibitors!’ (Extended Data 
Fig. 1). These results indicate that c-Jun mRNA translation occurs inde- 
pendently of eIF4F and that the process of elF3-specialized translation 
is fundamentally distinct at the initial stage of 5’ cap recognition. 
eIF3-specialized translation requires recognition of an internal RNA 
stem-loop for efficient translation”. Therefore, we asked whether eIF3 
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Figure 1 | 5’ end recognition of c-Jun mRNA is elF4F-independent. 

a, Distribution of c-Jun or ACTB mRNA-containing initiation complexes 
in programmed 293T cell in vitro translation extracts. The mRNA 
abundance (black line) is expressed as the fraction of total recovered 
transcripts. The results are given as the mean + s.d. of a representative 
quantitative RT-PCR experiment performed in duplicate. The polysome 
profile (grey line) is plotted as relative absorbance at 254 nm versus elution 
fractions. b, Western blot analysis of initiation factors in 48S translation 
complexes formed on c-Jun and ACTB mRNAs. 293T, total protein from 
293T in vitro translation extracts. rp$19, ribosomal protein $19. For gel 
source data, see Supplementary Fig. 1. c, Phosphorimage of SDS-PAGE 
gel resolving RNase-protected **P-internal or **P-cap-labelled c-Jun 5! 
UTR RNA crosslinked to elF3 subunits. Recombinant elF3a migrates 

at ~100 kDa owing to a C-terminal truncation”®. The results of a—c are 
representative of three independent experiments. 
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Figure 2 | Structure of eIF3d reveals a conserved cap-binding domain. 
a, Cartoon schematic and phylogenetic conservation of eIF3d amino acid 
sequence according to physiochemical property similarity. Peptides in the 
cap-binding domain as identified by limited proteolysis are mapped below. 
b, Structure of the eIF3d cap-binding domain. «-helices are coloured 


might also be involved in 5’ cap recognition. In agreement with the 
previously demonstrated RNA-binding capability of eIlF3, the four 
eIF3 RNA-binding subunits, eIF3a, eIF3b, eIF3d and elF3g, provide 
RNase protection to internally **P-labelled c-Jun 5’ UTR RNA after 
UV s4-induced crosslinking? (Fig. 1c). By contrast, when the *’P label 
is placed in the 5’ cap of c-Jun mRNA, RNase protection is observed 
with a single subunit of elF3, corresponding to eIF3d (Fig. 1c, Extended 
Data Fig. 2a). We confirmed subunit identity by limited proteolysis and 
mass spectrometry, and defined a C-terminal region of eIF3d that is 
responsible for protection of the 5’ mRNA terminus (Extended Data 
Fig. 2). The mapped C-terminal region of eIF3d is broadly conserved 
throughout plant, fungal and animal phylogeny (Fig. 2a, Extended Data 
Fig. 3), suggesting the apparent 5’ end recognition activity of eIF3d is 
an evolutionarily preserved function of the eIF3 complex. 

To understand how eIF3d recognizes the 5’ RNA terminus, we deter- 
mined a 1.4A crystal structure of the conserved C-terminal domain of 
eIF3d from Nasonia vitripennis (65% identical, 84% similar to human 
eIF3d) using sulfur anomalous dispersion for phase determination 
(Extended Data Table 1, Extended Data Fig. 3). The structure of elF3d 
reveals a complex fold that forms a cup-shaped architecture with a 
positively charged central tunnel that is negatively charged at its base 
(Fig. 2b). Remarkably, despite no significant sequence homology, 
the structural topology of elF3d is nearly identical to the DXO pro- 
teins, a recently described family of 5’ cap-endonucleases involved in 
RNA quality control’*!® (Fig. 2c, Extended Data Fig. 4). In contrast 
to DXO, eIF3d contains a unique insertion of ~15 highly conserved 
amino acids between strand 85 and helix a6. The elF3d-specific inser- 
tion folds down along the front face of the domain, making loosely 
packed charged interactions that close off the RNA binding tunnel 
(Extended Data Fig. 5). We term this insertion an “RNA gate; as the 
sequence clashes with the path of single-stranded RNA (ssRNA) bound 
to DXO!* and must undergo a conformational change for eIF3d to 
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in blue and 6-strands in magenta. c, Topological maps of the eIF3d cap- 
binding domain and the DXO cap-endonuclease domain”. d, Structures 
comparing the eIF3d cap-binding domain with its gate insertion to DXO 
bound to RNA (PDB 4J7L). 


become competent for RNA recognition (Fig. 2d). We determined the 
structure of eIF3d in two additional crystal forms, and confirmed the 
RNA gate exhibits a closed conformation regardless of crystal packing 
(Extended Data Fig. 6). As elF3d does not bind all capped RNAs!”!8, 
we postulate that the RNA gate regulates cap recognition to prevent 
promiscuous mRNA binding before assembly of eIF3d into the full 
eIF3 complex. We tested this model using c-Jun mRNA, and verified 
that eIF3d cap-recognition only occurs in the context of a full elF3 
complex and requires previous elF3-sequence-specific RNA interac- 
tions with the eIF3-recruitment stem-loop (Extended Data Fig. 7). 
Allosteric communication between eIF3 subunits during initial RNA 
recruitment likely facilitates eIF3d RNA gate opening to allow 5’ end 
recognition. The structure of eIF3d therefore reveals a new cap-binding 
protein and explains the ability of the eIF3 complex to protect the 5’ 
end of mRNA (Fig. Ic). 

To validate the structural finding that eIF3d is a cap-binding protein, 
we examined the ability of eIF3 to bind the c-Jun mRNA 5’ cap in the 
presence of competitor ligands. elF3d cap recognition is sensitive to 
m’GDP competition but resistant to GDP, indicating that, analogous 
to elF4E*, eIF3d specifically interacts with the 5’ cap and requires a 
mature methylated cap structure for recognition (Fig. 3a). Using the 
DXO-RNA structure as a template’, we modelled a capped ssRNA 
along the basic binding groove shared between eIF3d and DXO and 
identified two conserved helices («5 and «11) likely to be involved in 
cap recognition (Fig. 3b). We purified recombinant elF3 containing 
helix a5- or al1-mutated eI[F3d and demonstrated that both mutants 
have markedly reduced ability to crosslink to the c-Jun mRNA cap 
(Fig. 3c). e[F3d-mutated complexes retain wild-type-levels of RNA- 
binding, indicating that these residues specifically coordinate 5’ mRNA 
cap recognition (Extended Data Fig. 8). We next introduced haemag- 
glutinin (HA) epitope-tagged wild-type or mutant eIF3d into 293T 
cells, and measured the assembly of 48S initiation complexes on c-Jun 
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Figure 3 | eIF3d cap-binding activity is required for efficient 48S 
initiation complex formation on specific mRNAs. a, Phosphorimage of 
SDS-PAGE gel resolving RNase-protected *P-cap-labelled c-Jun 5’ UTR 
RNA crosslinked to eIF3 in the presence of competitor ligands (m’7GDP, 
GDP). b, Electrostatic surface view of the eIF3d cap-binding domain 
coloured by charge, with a zoomed view of ssRNA and cap analogue 
modelled according to their positions bound to DXO". Positive charge is 
coloured blue, negative charge is in red, and the RNA gate is removed for 
clarity. c, Phosphorimage of SDS-PAGE gel resolving RNase-protected 
*2P_cap-labelled c-Jun 5’ UTR RNA crosslinked to wild-type (WT) or helix 
a5- or helix «11-mutant eIF3. Helix «5-mutant eIF3d: D249Q/V262I/ 
Y263A; helix a11-mutant eIF3d: T317E/N320E/H321A. d, Incorporation 
of c-Jun and ACTB mRNA into initiation complexes by wild-type, helix 
a5-, or helix a11-mutant eIF3d as measured by quantitative RT-PCR. The 
mRNA-ribosome association is expressed as the ratio of the quantity of 
mRNA transcripts to 18S rRNA and normalized to the wild-type sample. 
The results are representative of three independent experiments and given 
as the mean + s.d. from a representative quantitative RT-PCR experiment 
performed in duplicate. 


mRNA by quantitative RT-PCR!®”°. Mutations to the predicted elF3d 
cap-binding surface inhibit c-Jun mRNA incorporation into transla- 
tion complexes, while the control ACTB mRNA is unaffected (Fig. 3d, 
Extended Data Fig. 8). These results demonstrate that cap binding by 
eIF3d is required for efficient initiation complex formation during 
eIF3-specialized translation. 

eIF3d recognition of the 5’ cap structure provides an alternative 
cap-dependent translation mechanism from canonical eIF4F cap 
recognition. Perplexingly, when the RNA stem-loop element that 
recruits eIF3 to the c-Jun mRNA is deleted, translation is inhibited 
even though the mRNA contains a 5’ cap”. We proposed that an RNA 
element within the c-Jun mRNA blocks recruitment of the e[F4F com- 
plex. In support, the 5’ cap of c-Jun mRNA crosslinks less efficiently to 
purified elF4E than that of the ACTB mRNA (Extended Data Fig. 9). 
To identify the eIF4F inhibitory element, we constructed luciferase 
reporters to test deletions in the c-Jun 5’ UTR (Fig. 4a). Deletion of 
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Figure 4 | An RNA element inhibits e[F4F recruitment and directs 
mRNAs to use an eI F3-specialized translation pathway. a, Schematic 

of c-Jun 5’ UTR truncation-luciferase (Luc) reporter mRNAs. SL, stem- 
loop. b, Luciferase activity from in vitro translation of mRNAs containing 
truncations of the c-Jun 5’ UTR, with or without the internal eIF3- 
recruitment stem-loop sequence. The results are given as the mean + s.d. 
of three independent experiments, each performed in triplicate. c, Western 
blot analysis of initiation factors in 48S translation initiation complexes 
formed on c-Jun mRNA with a 5’ 153-nucleotide truncation. 293T, total 
protein from 293T in vitro translation extracts. The result is representative 
of three independent experiments. For gel source data, see Supplementary 
Fig. 1. d, Model for elF3d-directed cap-dependent mRNA translation. 

An elF4F-inhibitory RNA element ensures that mRNA translation occurs 
through an eIF3-specialized pathway. 


the 5’ 153 nucleotides, but not the initial 67 nucleotides, was suffi- 
cient to allow c-Jun mRNA translation to occur independently of the 
eIF3-recruitment stem-loop, suggesting that canonical cap dependent 
translation is no longer blocked (Fig. 4b). We confirmed by western 
blot analysis of the 48S initiation complex formed on c-Jun mRNA 
lacking the 5’ 153 nucleotides that the e[F4F components are now 
present (Fig. 4c). 

Together, we put forth a model of a previously undiscovered cap- 
dependent translation initiation pathway controlled by eIF3d recogni- 
tion of the 5’ mRNA cap (Fig. 4d). We postulate that encoding more than 
one mechanism of cap-dependent translation allows cells to control 
protein synthesis specifically in cellular environments in which eIF4E 
is inactivated. In support, c-Jun mRNA translation is resistant to treat- 
ment of cells with chemicals that activate the 4E-BPs”*?! (Extended 
Data Fig. 1). As modulation of eIF4E cap-binding activity allows cells 
to incorporate extracellular stimuli into altered translation outputs”, 
it will be important to discover whether eIF3d activity is analogously 
regulated. Furthermore, our data indicates that the c-Jun mRNA 
encodes an additional cis-acting RNA element that blocks eIF4F to 
ensure translation can only occur through an eIF3-specialized pathway. 
RNA elements that block elF4F recruitment may be a common theme 
to direct mRNAs into specific translation pathways to ensure controlled 
protein expression. For example, a subset of homeobox mRNAs contain 
an RNA element that blocks cap-dependent translation to ensure usage 
of an internal ribosome entry site and to allow for correct homeobox 
expression during embryonic development”. Several eIF3-specialized 
mRNAs encode proteins involved in the control of cell prolifera- 
tion, suggesting that their translation may also require enhanced 
regulation?*. 

While considerable advances have been made in the structural 
understanding of elF3 bound to the ribosome, direct localization of 
eIF3d in a 48S complex remains unclear”°. Thus, understanding how 
eIF3d functions and assembles within the full translation initiation 
complex will have important mechanistic implications in how cap 
recognition links to mRNA ribosomal recruitment. Our discovery of 
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eIF3d as a cap-binding protein now reveals a new translation path- 
way independent of eIF4E, and adds another layer of cap-dependent 
translation. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


Cells and transfections. Human 293T cells were maintained in DMEM 
(Invitrogen) supplemented with 10% FBS (Seradigm). The cells were obtained 
from the University of California, Berkeley, Cell Culture Facility, which authen- 
ticates cells by STR profiling and tests for mycoplasma contamination. Plasmid 
transfections were performed using Lipofectamine 2000 (Invitrogen), following 
the manufacturer's protocol, and polysome or immunoprecipitation analyses were 
performed at 48h after transfection. For INK128 (Cayman Chemical) cell treat- 
ment, 293T cells were incubated with the indicated concentration of INK128 for 
~14-16h before cell lysis. 
Plasmids. To generate the eIF3d expression plasmids, elF3d was amplified from 
human cDNA and inserted into pcDNA5/EFRT. A 39-nucleotide linker followed 
by the HA epitope tag (YPYDVPDYA) was subsequently inserted before the elF3d 
stop codon. The wild-type c-Jun 5’ UTR luciferase reporter plasmid was previously 
described’, To generate the c-Jun in vitro transcription template, the 5’ UTR, ORF 
and 3’ UTR were separately amplified from human cDNA and stitched together 
downstream of a T7 promoter by Gibson cloning into pcDNA4. The ACTB 
in vitro transcription template was constructed by addition of a T7 promoter during 
amplification of the full mRNA from human cDNA and inserted into pcDNA4. 
Western blot. Western blot analyses were performed using the following anti- 
bodies: anti-eIF3d (Bethyl A301-758A), anti-eIF4A1 (Cell Signaling 2490), 
anti-eIF4G1 (Cell Signaling 2858), anti-rpS19 (Bethyl A304-002A), anti-eIF4E 
(Bethyl A301-154A), anti-rpLPO (Bethyl A302-882A), anti-HA epitope tag (Pierce 
26183), anti-c-Jun (Cell Signaling 9165), anti-Hsp90 (BD Biosciences 610418), and 
anti-4E-BP1 (Cell Signaling 9644). 
In vitro RNA transcription and labelling. Unlabelled RNAs were in vitro tran- 
scribed, polyadenylated, and capped as previously described’. For internal radio- 
labelling of RNAs, in vitro transcription was performed in the presence of 0.1 1M 
[o-*?P] ATP, then the RNA was subsequently capped with vaccinia virus enzymes 
(NEB). For radiolabelling of the 5’ cap, in vitro transcribed RNAs were capped with 
vaccinia virus enzymes and [«-*2P]GTP. RNAs were purified by phenol-chloroform 
extraction and ethanol precipitation. 
In vitro translation. In vitro translation extracts were made from human 293T 
cells as previously described’. Lysates were nuclease-treated with 18 gel U jl! 
micrococcal nuclease (NEB) in the presence of 0.7 mM CaCl, for 10 min at 25°C, 
and the digestion was stopped by addition of 2.24mM EGTA. Each translation 
reaction contained 50% in vitro translation lysate and buffer to make the final 
reaction with 0.84mM ATP, 0.21 mM GTP, 21 mM creatine phosphate (Roche), 
45 U ml ' creatine phosphokinase (Roche), 10 mM HEPES-KOH, pH 7.6, 2mM 
DTT, 8mM amino acids (Promega), 255 mM spermidine, 1 U ml! murine RNase 
inhibitor (NEB), and mRNA-specific concentrations of Mg(OAc) and KOAc. The 
optimal magnesium and potassium levels to add were determined to be 1.5mM 
Mg(OAc), and 150mM KOAc for c-Jun mRNA, and 1 mM and 150mM KOAc for 
ACTB mRNA. For luciferase assays, translation reactions were incubated for 1h at 
30°C, then luciferase activity was assayed. 
48S initiation complex purification. For 48S initiation complex purification from 
in vitro translation reactions, 180 tl reactions were incubated in the presence of 
GMP-PNP for 20 min at 30°C and centrifuged for 6 min at 12,000g at 4°C. Lysates 
were purified by size-exclusion chromatography through a 1 ml column packed 
with Sephacryl S-400 gel filtration resin (GE Healthcare) and the elutant was 
centrifuged through a 10-25% (w/v) sucrose gradient by centrifugation for 3.5h at 
38,000 r.p.m. at 4°C in a Beckman SW41 Ti rotor’. Fractions were collected from 
the top of the gradient using a peristaltic pump with a Brandel tube piercer. From 
the appropriate fractions, RNA was purified by phenol-chloroform extraction 
and ethanol precipitation and protein was precipitated with trichloroacetic acid. 
For affinity purification of HA epitope-tagged e[F3d-associated 48S initiation 
complexes from cells, three 10cm plates of transfected 293T cells were treated 
with 100,1g ml! cycloheximide for 5 min. Cells were washed with ice-cold PBS 
(137mM NaCl, 2.7mM KCl, 100 mM Na,HPO,, 2mM KH2PO,) with 100g ml! 
cycloheximide and collected in lysis buffer (20 mM HEPES-KOH pH 7.4, 150 mM 
KOAc, 2.5mM Mg(OAc)2, 1 mM DTT, 100jg ml"! cycloheximide, 1% (v/v) Triton 
X-100). Lysates were centrifuged for 6 min at 12,000g at 4°C and purified by S-400 
size-exclusion chromatography. 801] of anti-HA antibody-conjugated agarose 
beads (Sigma) was added to the elutants, tumbled for 1.5h at 4°C, and beads were 
washed three times with lysis buffer without Triton X-100. Bound complexes were 
eluted twice with 100j1g ml! HA peptide and elutants were centrifuged and ana- 
lysed the same as for the in vitro purification reactions. 
Quantitative real-time PCR. cDNA was reverse-transcribed from RNA using 
random hexamers and Superscript III (Invitrogen), following the manufacturer's 
protocol. Real-time PCR was performed using DyNAmo HS Sybr Green 
(ThermoFisher), with a 201] reaction volume containing 211 CDNA and 0.541M of 
each primer. The following oligonucleotides were used: 18S rRNA forward, 
5'-GGCCCTGTAATTGGAATGAGTC-3’, 18S rRNA reverse, 5’- CCAAGA 
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TCCAACTACGAGCTT-3’; ACTB forward, 5’-CTCTTCCAGCCTTCCTTCCT-3’, 
ACTB-reverse, 5‘-AGCACTGTGTTGGCGTACAG-3’; c-Jun forward, 
5’- TGACTGCAAAGATGGAAACG-3’, c-Jun reverse; 5’-CAGGGTCATGCT 
CTGTTTCA-3’. 

eIF3-RNA crosslinking and gel shift. Recombinant eIF3 was expressed and puri- 
fied from Escherichia coli as previously described”®, For each crosslinking reaction, 
1 jl water, 1,11 125M labelled RNA, 11 10,.g ml! heparan sulfate (Sigma), 1 jl 
5x binding buffer (125 mM Tris-HCl, pH 7.5, 25mM Mg(OAc),, 350mM KCl, 
0.5mM CaCl, 0.5 mg ml~! BSA, 10mM TCEP), and 1 il 1.54.M purified eIF3 
were added, in the listed order, and incubated for 30 min at 25°C. For competition 
experiments, the water was substituted with 1 jul of 1! mM m’GDP/Mg?* or GDP/ 
Mg?*. UV>54-induced crosslinking was performed using a short-wave UV lamp 
placed ~4 cm above the samples on ice for 10 min. After treatment with RNase for 
10 min at 37°C, proteins were separated by 12% SDS-PAGE, the gel was dried, and 
imaged using a phosphorimager. For digestion of internal labelled RNA, 2.5 U ben- 
zonase (Novagen) and 250 U RNase T1 (ThermoFisher) were used; for digestion 
of cap labelled RNA, 4 U RNase R (Epicentre) and 1 U RiboShredder (Epicentre) 
were used. For eIF3d subunit identification, after RNase treatment, samples were 
denatured and immunoprecipitation was performed as previously described”. For 
limited proteolysis, after RNase treatment, the reactions were treated with 2 or 
201g ml! sequencing grade trypsin (Promega) for 30 min at 25°C, before gel elec- 
trophoresis. Mass spectrometry samples were prepared as previously described’. 
Gel-shift assays were performed as previously described, using 50 nM labelled c-Jun 
stem-loop RNA and 300 nM purified eIF3 (ref. 2). 

Recombinant eIF3d protein purification. Candidate elF3d cap-binding domain 
fragments were amplified by PCR and cloned into a modified pET vector to express 
an N-terminal 6x His (KSSHHHHHHGSS)-MBP-TEV fusion protein as previ- 
ously described”’, Extensive expression trials were conducted to determine optimal 
N- and C-terminal domain boundaries and identified a minimal stable human 
cap-binding domain $161-F527. Recombinant protein was expressed in BL21- 
RIL DE3 E. coli cells co-transformed with a pRARE2 tRNA plasmid (Agilent). 
E. coli was grown in 2x YT media at 37°C to an OD¢09 of ~0.5, cooled at 4°C for 
15 min, induced with addition of 0.5mM IPTG and then incubated with shaking 
for ~20h at 16°C. Pelleted cells were washed with PBS and then lysed by sonication 
in lysis buffer (20 mM HEPES-KOH pH 7.5, 400 mM NaCl, 10% glycerol, 30 mM 
imidazole, 1 mM TCEP) in the presence of EDTA-free Complete Protease Inhibitor 
(Roche). Following centrifugation for 30 min at 23,000g and 4°C, clarified lysate 
was incubated with Ni-NTA agarose resin (QIAGEN) for 1 h at 4°C with gentle 
rocking. Resin was washed with lysis buffer supplemented to 1 M NaCl and eluted 
by gravity-flow chromatography at 4°C with lysis buffer supplemented to 300 mM 
imidazole. The eluted fraction was diluted to ~50 mM imidazole and 5% glycerol, 
concentrated to ~50 mg ml”! and incubated with Tobacco Etch Virus protease for 
~12h at 4°C to remove the MBP tag. Recombinant eIF3d was isolated from free 
MBP by diluting with gel-filtration buffer (20 mM HEPES-KOH pH 7.5, 250mM 
KCl, 1mM TCEP) and passing over a 5 ml Ni-NTA column (QIAGEN) connected 
in line with a 5 ml MBP-Trap column (GE Life Sciences) before additional purifi- 
cation by size-exclusion chromatography on a Superdex 75 16/60 column. Final 
purified elF3d was concentrated to ~20-50 mg ml”, used immediately for crys- 
tallography, or flash frozen in liquid nitrogen for storage at —80°C. 
Crystallization and structure determination. Initial crystals of human eIF3d were 
grown at 18°C by hanging drop vapour diffusion, but diffracted poorly. Analogous 
elF3d cap-binding domain sequences were cloned from a panel of highly homol- 
ogous animal sequences, with the equivalent domain from the parasitic wasp 
N. vitripennis (S172-F537) producing the best crystals. Optimized N. vitripennis 
elF3d crystals were grown in 211 hanging drops set at a 1:1 ratio over 300 iil of 
reservoir liquid: 200 mM (NH4)2SOq, 100 mM Bis-Tris 6.5, 23-27% PEG-3350 
(crystal form 1), 1.6-1.8 M ammonium citrate, pH 7.0 (crystal form 2), or 200 mM 
NaCl, 100 mM Tris 8.5, 25% PEG-3350 (crystal form 3). eIF3d crystals (crystal 
forms 1 and 2) were cryoprotected by covering the drop with a layer of saturated 
paratone-N or NVH oil (Hampton) and crystals were transferred into the oil emersion 
and cleaned using a Kozak cat whisker as previously described’, or cryoprotected 
by transferring to a reservoir solution supplemented with 20% ethylene glycol 
(crystal form 3). Crystals were harvested with a nylon loop and then flash-frozen 
in liquid nitrogen. X-ray diffraction data were collected under cryogenic 
conditions at the Lawrence Berkeley National Laboratory Advanced Light Source 
(beamline 8.3.1). 

Data were processed with XDS and AIMLESS” using the SSRL autoxds script 
(A. Gonzalez, Stanford SSRL). eIF3d crystals belonged to the orthorhombic space 
group P2) 2; 2;,and contained either two copies per asymmetric unit (crystal form 1) 
or one copy (crystal form 2), or the space group P2) and contained two copies per 
asymmetric unit (crystal form 3). Experimental phase information was collected 
from a native crystal using sulfur single-wavelength anomalous dispersion. Data 
were collected at a minimal accessible wavelength (~7,235 eV) and iterative data 
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sets were completed and merged from independent portions of an exceptionally 
large eIF3d crystal. After ~90x multiplicity, anomalous signal was detected 
to ~2.4A, and a clear phase solution was obtained at ~120x multiplicity. 35 sites 
were identified with HySS in PHENIX?! corresponding to 32 sulfur atoms in 
eIF3d and 3 chloride positions. Phases were extended to the native elF3d data set 
processed to ~1.40 A using SOLVE/RESOLVE™, and model building was com- 
pleted in Coot*? before refinement with PHENIX. X-ray data for refinement were 
extended according to an I/o resolution cut-off of ~1.5, CC* correlation and 
Rpim parameters, and visual inspection of the resulting map**. A completed elF3d 
cap-binding domain from crystal form 1 was used as a search model to determine 
phases for crystal form 2 and 3 using molecular replacement. Final structures were 
refined to stereochemistry statistics for Ramachandran plot (favoured/allowed), 
rotamer outliers, and MolProbity score as follows: crystal form 1, 96.8%/3.2%, 0.2% 
and 1.40; crystal form 2, 97.3%/2.7%, 0% and 1.29; crystal form 3 97.4%/2.6%, 
0.9% and 1.26. 

Recombinant eIF4E protein purification and RNA crosslinking. Full-length 
human eIF4E was cloned and expressed using the same protocol as for eIF3d. 
elF4E-RNA crosslinking was performed as described for eIF3-RNA crosslinking, 


but using 25nM RNA with normalized counts per million and the indicated 
concentration of eIF4E. RNase treatment was performed using 4 U RNase R and 
250 U RNase T1. 
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Extended Data Figure 1 | c-Jun expression is unaffected by 4E-BP1 
activation. Representative western blot of 293T cells after 24h treatment 
with mTOR inhibitor INK128. The results are representative of three 
independent experiments. For gel source data, see Supplementary Fig. 1. 
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elF3d (NP_003744) 
MAKFMTPVIQDNPSGWGPCAVPEQFRDMPYQPFSKGDRLGKVADWTGATYQDKRYTNKYSSQFGGGSQYAYFHEE 
DESSFOQLVDTARTOQKTAYOQRNRMRFAQRNLRRDKDRRNMLOFNLOILPKSAKQKERERIRLOKKFOQKOFGVROKW 
DOQKSQKPRDSSVEVRSDWEVKEEMDFPOLMKMRYLEVSEPQDIECCGALEYYDKAFDRITTRSEKPLRSIKRIFH 
TVTTTDDPVIRKLAKTOGNVFATDAILATLMSCTRSVYSWDIVVORVGSKLFFDKRDNSDFDLLTVSETANEPPQ 
DEGNSFNSPRNLAMEATY INHNFSQQCLRMGKERYNFPNPNPFVEDDMDKNETASVAYRYRRWKLGDDIDLIVRC 
EHDGVMTGANGEVSFINIKTLNEWDSRHCNGVDWROKLDSQRGAVIATELKNNS YKLARWTCCALLAGSEYLKLG 
YVSRYHVKDSSRHVILGTQQOFKPNEFASQINLSVENAWGILRCVIDICMKLEEGKYLILKDPNKQVIRVYSLPDG 
TFSSDEDEEEEEEEEEEEEEEET 


Extended Data Figure 2 | Mapping of a C-terminal region of eIF3d c-Jun 5' UTR RNA. Full-length and proteolysis fragments of elF3d are 

that protects the c-Jun 5’ cap structure. a, Validation of elF3d indicated by black and maroon arrows, respectively, on the phosphorimage 
subunit identification. elF3d-cap crosslinking was validated by and Coomassie-stained SDS gels. c, Mass spectrometry identification of 
immunoprecipitation of eIF3d after crosslinking and denaturing the trypsinized peptides from limited proteolysis of cap-crosslinked eIF3d. 
eIF3 complex by boiling in SDS. The result is representative of biological Identified peptides are highlighted in blue. The results in b and c are 
replicates. b, Limited proteolysis of eIF3 crosslinked to **P-cap-labelled representative of three independent experiments. 
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Extended Data Figure 3 | Purification of e[F3d cap-binding domain. a, Alignment is coloured by phylogenetic conservation of amino acid 
physiochemical property similarity and a cartoon schematic of the eIF3d secondary structure is depicted below the sequences. Colouring begins 
at 30% conservation (lightest blue). b, Coomassie-blue-stained SDS gel of recombinant N. vitripennis e[F3d S172-F537 protein expressed in E. coli. 
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Extended Data Figure 4 | Structure-based alignment of eIF3d cap- coloured by phylogenetic conservation as in Extended Data Fig. 2, and 
binding domain and DXO cap-endonuclease domain sequences. cartoon schematics of the secondary structures are depicted below the 
Structure-based alignment of eI[F3d and DXO sequences according to sequences. eIF3d is coloured in blue and magenta as in Fig. 2, and DXO is 
superposition of eI[F3d and DXO (PDB 4J7L) structures!>. Alignment is coloured in green and magenta. 
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Extended Data Figure 5 | Structural details of e[F3d ‘RNA gate’ 
stabilizing interactions. a—c, Structural overview of eIF3d with cut-away 
sections highlighting charged interactions stabilizing the closed “RNA 
gate’ conformation. No significant van der Waals interactions stabilize 

the closed gate conformation, supporting likely repositioning of the RNA 
gate before 5’ mRNA cap recognition. Charged interactions occur in three 
areas: a, at the beginning of the gate insertion sequence (gate beginning); 
b, at the tip of the unstructured loop (gate tip); and c, at an ‘arginine 
anchor’ point stabilizing the return of the loop insertion sequence to the 
a-helix shared with DXO family endonucleases. Residues are numbered 
according to the human eIF3d sequence, and all positions are conserved 
between human and N. vitripennis except S292N. eIF3d RNA gate residues 
are displayed with blue side chains and the residues making stabilizing 
contacts are coloured in green. 2F, — F, map regions are shown at 1.50. 
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a b c 
Crystal Form 1 Crystal Form 2 Crystal Form 3 


(P 212121) (P 212121) > (P 21) 


elF3d Molecule 1 elF3d Molecule 1 elF3d Molecule 1 
elF3d Molecule 2 elF3d Molecule 2 
Symmetry Molecules Symmetry Molecules Symmetry Molecules 

Extended Data Figure 6 | Packing interactions observed in alternative Cut-away zoom illustrates position of the e[F3d RNA gate (red) relative 
eIF3d crystal forms. Cartoon representation of crystallographic packing to the nearest symmetry-related molecule. In crystal form 1, the RNA 
in elF3d crystal form 1, 2 and 3 (a, b and c). Crystal forms 1 and 3 have gate is packed against a neighbouring symmetry molecule, but in crystal 
two copies of eIF3d in the asymmetric unit coloured in blue/magenta and forms 2 and 3, the RNA gate is positioned towards a major solvent channel. 
green/magenta, respectively; crystal form 2 has only one copy of eIF3d Relative conformation of the RNA gate remains unchanged in either eIF3d 
per asymmetric unit. Symmetry-related molecules are depicted in grey. crystal form. 
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Extended Data Figure 7 | e[F3d cap-binding activity requires the 
eIF3-recruitment stem-loop RNA. Phosphorimage of SDS gel resolving 
RNase-protected **P-cap-labelled c-Jun stem-loop RNA crosslinked 

to eIF3 subunits. The result is representative of three independent 
experiments. 
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Extended Data Figure 8 | Incorporation of HA epitope-tagged elF3d 
into translation initiation complexes. a, Coomassie-blue-stained SDS 

gel of recombinant eIF3 containing wild-type or helix a5- or «11-mutated 
eIF3d. b, Representative native agarose gel electrophoresis of recombinant 
wild-type and mutant eIF3 complexes bound to the c-Jun stem-loop. 

c, Polysome profiles of untransfected 293T cells, plotted as relative 
absorbance at 254 nm versus elution fractions. d, Western blot analysis of 
eIF3d and the small (rpS19) and large (rpLPO) ribosomal subunits. The 
results in b-d are representative of three independent experiments. For gel 
source data, see Supplementary Fig. 1. 
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Extended Data Figure 9 | e[F4E recognizes the 5’ end of the c-Jun 
mRNA less efficiently than ACTB mRNA. a, Coomassie-blue-stained SDS 
gel of recombinant human eIF4E expressed in E. coli. b, Phosphorimage 

of SDS gel resolving RNase-protected **P-cap-labelled ACTB or c-Jun 

5’ UTR RNA crosslinked to eIF4E. The result is representative of three 
independent experiments. For gel source data, see Supplementary Fig. 1. 
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Extended Data Table 1 | Summary of data collection, phasing and refinement statistics 


elF3d elF3d elF3d eIF3d 
Crystal Form 1 Crystal Form 2 Crystal Form 3 (S-SAD) 
(5K4B) (5K4C) (5K4D) 

Data collection 
Space group P 2,2;2; P 2)2)2, P 2, P 2,2)2, 
Cell dimensions 

a, b,c (A) 61.90, 62.93, 192.24 49.01, 61.84, 138.31 49.97, 144.32, 55.30 61.66, 62.72, 193.28 

a, By () 90.0, 90.0, 90.0 90.0, 90.0, 90.0 90.0, 109.12, 90.0 90.0, 90.0, 90.0 
Wavelength 1.11586 1.11586 1.11587 1.71371 
Resolution (A)? 48.06—1.40 (1.42-1.40) 46.20-1.70 (1.73-1.70) 49.13-2.00 (2.05—2.00) 38.66—1.92 (1.96-1.92) 
Roim 2.6 (45.3) 5.0 (45.0) 8.6 (52.3) 0.9 (8.1) 
T/o(1) 12.9 (1.5) 9.1 (1.6) 6.5 (1.4) 65.2 (6.7) 
CCin 99.9 (57.9) 99.6 (63.9) 98.8 (55.6) 100 (94.7) 
Completeness (%) 100 (99.2) 99.4 (89.4) 99.8 (98.3) 98.5 (86.1) 
Redundancy 15.8 (10.1) 4.1 (3.6) 3.0 (2.7) 129.0 (46.6) 
Refinement 
Resolution (A) 48.06-1.40 46.20-1.70 49.13-2.00 
No. reflections 

Total 2,349,428 193,627 147,613 

Unique 148,455 47,123 49,757 

Free (%) 2 5 5 
Ryork / Rive 17.5/19.3 16.2/19.6 17.7/ 20.8 
No. atoms 

Protein 5837 2953 5686 

Ligand/ion 3 (Cl) 18 (glycerol) - 

Water 910 385 7Al 
B factors 

Protein 217 24.2 26.6 

Ligand/ion 31:7 29.1 - 

Water 33.1 36.7 34.3 
r.m.s deviations 

Bond lengths (A) 0.007 0.012 0.004 

Bond angles () 1.167 1.359 0.638 


Single crystals were used to collect data for each structure. 
*Values in parentheses are for highest-resolution shell. 
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The structural basis of modified nucleosome 


recognition by 53BP1 


Marcus D. Wilson!*, Samir Benlekbir2*, Amélie Fradet-Turcotte!+, Alana Sherker!?, Jean-Philippe Julien?*°, Andrea McEwan!, 
Sylvie M. Noordermeer', Frank Sicheri!**, John L. Rubinstein? *+° & Daniel Durocher!* 


DNA double-strand breaks (DSBs) elicit a histone modification 
cascade that controls DNA repair!->. This pathway involves the 
sequential ubiquitination of histones H1 and H2A by the E3 
ubiquitin ligases RNF8 and RNF168, respectively**. RNF168 
ubiquitinates H2A on lysine 13 and lysine 15 (refs 7, 8) (yielding 
H2AK13ub and H2AK15ub, respectively), an event that triggers 
the recruitment of 53BP1 (also known as TP53BP1) to chromatin 
flanking DSBs”!°. 53BP1 binds specifically to H2AK15ub- 
containing nucleosomes through a peptide segment termed the 
ubiquitination-dependent recruitment motif (UDR), which 
requires the simultaneous engagement of histone H4 lysine 20 
dimethylation (H4K20me2) by its tandem Tudor domain'®"'. How 
53BP1 interacts with these two histone marks in the nucleosomal 
context, how it recognizes ubiquitin, and how it discriminates 
between H2AK13ub and H2AK15ub is unknown. Here we present 
the electron cryomicroscopy (cryo-EM) structure of a dimerized 
human 53BP1 fragment bound to a H4K20me2-containing and 
H2AK15ub-containing nucleosome core particle (NCP-ubme) 
at 4.5 A resolution. The structure reveals that H4K20me2 and 
H2AK15ub recognition involves intimate contacts with multiple 
nucleosomal elements including the acidic patch. Ubiquitin 
recognition by 53BP1 is unusual and involves the sandwiching of 
the UDR segment between ubiquitin and the NCP surface. The 
selectivity for H2AK15ub is imparted by two arginine fingers in 
the H2A amino-terminal tail, which straddle the nucleosomal DNA 
and serve to position ubiquitin over the NCP-bound UDR segment. 
The structure of the complex between NCP-ubme and 53BP1 reveals 
the basis of 53BP1 recruitment to DSB sites and illuminates how 
combinations of histone marks and nucleosomal elements cooperate 
to produce highly specific chromatin responses, such as those 
elicited following chromosome breaks. 

To elucidate the molecular basis of 53BP1 recruitment to the 
chromatin surrounding DNA breaks, we sought to determine the 
three-dimensional structure of a complex consisting of 53BP1 
bound to an NCP modified with H4K20me2 and H2AK15ub. We 
incorporated a dimethyl-lysine mimic by cysteine alkylation of a 
histone H4(K20C) mutant”, yielding H4K-20me2 (Extended Data 
Fig. la, b). In parallel, we produced H2AK15ub by enzymatic ubiquiti- 
nation of H2A(K13R and K36R) in the context of a H2A-H2B dimer? 
(Extended Data Fig. 1c-g). The K36R mutation prevented off-target 
ubiquitination at this residue (Extended Data Fig. 1c; H2A(K13R and 
K36R) is hereafter referred to as H2A). Re-purified H4K-20me2 and 
H2AK15ub were reconstituted into histone octamers and then wrapped 
with a 145 base pair fragment of strong nucleosome-positioning ‘601’ 
DNA" to form NCP-ubme (Extended Data Fig. 1h). 

The minimal 53BP1 fragment recruited to DSB sites consists of a 
dimer of the tandem Tudor-UDR segment” (residues 1484-1631, 


Extended Data Fig. 2a). We used glutathione S-transferase (GST) as 
a dimerization module to produce a Tudor-UDR dimer (hereafter 
termed GST-53BP1, Extended Data Fig. 2a). GST-53BP1 bound to 
NCP-ubme with a Kg of 141M, with only an approximately twofold 
reduction in association compared to a 53BP1 fragment containing the 
native oligomerization domain (Extended Data Fig. 2c-e). The result- 
ing GST-53BP1 protein formed a stable complex with NCP-ubme with 
a 2:1 (53BP1:NCP-ubme ratio) stoichiometry (Fig. la and Extended 
Data Fig. 2e, g). 

We determined the structure of the NCP-ubme-GST-53BP1 
complex by single particle cryo-EM at 4.5 A resolution (Fig. 1b, c and 
Extended Data Fig. 3). The rigid nucleosome forms a symmetrical 
coin shape, consistent with other NCP-protein structures®!>-!9, with 
extra density attributable to ubiquitin and 53BP1 on each face of the 
NCP. The densities from the DNA and core secondary structural ele- 
ments of the histones were unmistakable and similar to previous NCP 
models!4”° (root mean squared deviation (r.m.s.d.) 0.95 A). The NCP 
density was sufficiently detailed to model several bulky side-chains 
(Extended Data Fig. 3g) and the path of the H2A N-terminal tail (see 
below). Ubiquitin was fit unambiguously into the cryo-EM map, linked 
via an isopeptide bond to H2A Lys15. No extra density was visualized 
for the GST moiety, which is probably located near the nucleosome 
dyad (Extended Data Fig. 3h). The cryo-EM density for the tandem 
Tudor domain of 53BP1 was weaker than the rest of the structure, but 
its centre of mass was fixed over the tail of H4, consistent with its teth- 
ering to H4Kc20me?2 (Fig. 1b). 

We attributed extra density projecting from the tandem Tudor 
domain to residues 1611-1631 of the 53BP1 UDR. This region traverses 
approximately 50 A across the NCP, forming contacts with both ubiq- 
uitin and histone surfaces (Fig. 1b-d). From its N terminus, the UDR 
snakes along the NCP-ubme surface, first across a solvent-exposed cleft 
formed between H4 and H2B, and then through a channel formed 
between the hydrophobic patch of ubiquitin and the NCP exterior. The 
UDR continues over the H2B aC helix, over Lys108 and His109, and 
terminates in a predicted helical conformation above the H2A-H2B 
acidic patch. Although we were unable to unambiguously assign the 
UDR density, we deduced the sequence register of the UDR based on 
complementarity to the local physicochemical environment and by 
mutational studies described below (Fig. 1d). 

We also determined the structure of the unliganded NCP-ubme at 
7.7 A resolution (Extended Data Fig. 4 and Fig. 2a). Comparison of 
ubiquitin in the bound and unbound states (Fig. 2b, c) suggests that 
ubiquitin becomes rigidly constrained upon 53BP1 binding following 
interaction with two separate interfaces: one containing the H2B aC 
helix and the other involving the UDR of 53BP1 (Fig. 2c, d). 

To facilitate validation of the NCP-ubme-GST-53BP1 structure, 
we substituted enzymatic ubiquitination with chemical ubiquitination 
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Figure 1 | Architecture of the NCP-ubme-GST-53BP1 complex. 

a, Size-exclusion chromatography multi-angle light scattering 
(SEC-MALS) performed on preformed NCP-ubme-GST-53BP1 
complex compared to NCP-ubme or GST-53BP1 alone. b, 3D cryo-EM 
map of the NCP-ubme-GST-53BP1 complex. The map was segmented 
and coloured according to the respective components. Weaker density 


of H2A”! using a K15C mutant (Extended Data Fig. 5a—c). We first 
assessed the relevance of the ubiquitin interaction with the H2B aC 
helix, which encompasses H2B residues Lys116, Thr119 and Ser123 
(Fig. 2d, left and Extended Data Fig. 5d). As predicted, mutation of H2B 
Thr119 to alanine (H2B(T119A)) or glutamate (H2B(T119E)), a phos- 
phomimetic residue, greatly reduced binding of GST-53BP1 to NCP- 
ubme (Extended Data Fig. 5e-g). As H2B Thr119 phosphorylation is 
detected during meiosis”, our results suggest that this histone mark 
could inhibit 53BP1 recruitment to DSBs. Lys120 of H2B lies in close 
proximity to the GST-53BP 1-bound ubiquitin (Fig. 2d, left). This resi- 
due is ubiquitinated during transcriptional elongation and DSB induc- 
tion?3-?5, and may antagonize 53BP1 recruitment to chromatin”>. We 
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Figure 2 | The molecular basis of H2AK15ub recognition by 53BP1. 
a, Overlay of the NCP-ubme, and NCP-ubme-GST-53BP1 structures. 
The cyan density is displayed with a higher threshold. b, Equivalent 
sections through maps of NCP-ubme (left) and NCP-ubme-GST-53BP1 
(right), coloured according to local resolution estimates”. c, Enlarged 
view of the region above the H2A Lys15 isopeptide bond with ubiquitin 
in both structures. d, Ubiquitin interactions with the aC helix of H2B 
(left). Magnified view of the H2AK15ub isopeptide bond, with flanking 
arginine-finger DNA-interacting residues indicated (right). 
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threshold. ub, ubiquitin. c, Model of the NCP-ubme-GST-53BP1 
complex. d, Secondary structure model and sequence of the UDR 
region. Sites of NCP and ubiquitin interaction are indicated. Amino 
acid conservation is indicated (white, poorly conserved; black, highly 
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tested whether 53BP1 binding tolerates H2BK120 ubiquitination and 
observed that NCPs containing both H2BK120ub and H2AK15ub are 
efficiently bound by GST-53BP1 (Extended Data Fig. 5h). Therefore it 
is unlikely that H2BK120 ubiquitination directly blocks 53BP1 recruit- 
ment to chromatin. 

RNF168 targets both H2A Lys13 and Lys15 in response to DSBs”, 
but only H2AK15ub is recognized by 53BP1 (ref. 10). The NCP-ubme- 
GST-53BP1 structure hints at the basis for this specificity, which 
probably involves an interaction between DNA and the N-terminal tail 
of H2A (Fig. 2d, right). The H2A residues Arg11 and Arg17 straddle the 
DNA at superhelical location 4.5, placing the Lys15-tethered ubiquitin 
in a position to interact with the UDR. A similarly ordered H2A tail 
has been observed previously”, suggesting it can adopt this confor- 
mation independently of ubiquitination. Consistent with the H2A-tail 
conformation underpinning 53BP1 binding specificity, shifting the 
ubiquitination site by 7 A to H2AK13 reduced binding to GST-53BP1 
(Extended Data Fig. 6a—c), and this binding was restored by a com- 
pensatory shift of the two arginine ‘fingers’ in the H2A N-terminal tail 
sequence (Extended Data Fig. 6c and ref. 10). 

The 28-residue UDR, C-terminal to the tandem Tudor domain, 
confers both ubiquitin and NCP recognition to 53BP1. Cross-linking 
experiments indicated that the C terminus of the UDR lies near the 
nucleosome acidic patch, while the N-terminal residues lie near to the 
H2B-H4 cleft (Extended Data Fig. 7a, b). The Sir3 BAH domain also 
interacts with the H2B-H4 cleft!®, suggesting that this surface may be 
commonly exploited by NCP-binding proteins (Extended Data Fig. 7c). 

In addition to H2B-ubiquitin interactions (Fig. 2d, left), direct 
interactions between the 53BP1 UDR and ubiquitin promote the 
constrained ubiquitin conformation. However, as the UDR shows 
no measurable affinity for free ubiquitin’®, these interactions are 
dependent on UDR binding to the NCP. Ubiquitin presents its hydro- 
phobic patch (comprising Leu8, Ile44 and Val70) towards the middle of 
the modelled UDR, centred on the aliphatic side chains of Ile1617 and 
Leu1619 of 53BP1 (Fig. 3a, left), which are also key residues for 53BP1 
recruitment to DSB sites'° (Fig. 3b and Extended Data Fig. 8a, b). The 
144A mutation in ubiquitin also reduced binding of GST-53BP1 to 
NCP-ubme, as observed previously'® (Extended Data Fig. 8c-e). The 
Lys6 and His68 side-chains of ubiquitin lie in proximity to two UDR 
residues, Asp1616 and Asp1620. Mutation of both acidic residues to 
lysine (D1616K and D1620K) reduced 53BP1 binding to NCP-ubme 
and impaired recruitment of 53BP1 to DSB sites (Fig. 3b, c), while 
the K6E and H68E ubiquitin mutations impaired 53BP1 binding 
to NCP-ubme (Extended Data Fig. 8c-e). Collectively, these results 
confirm that the unusual mode of ubiquitin recognition by 53BP1 
involves the sandwiching of the UDR segment by the NCP surface 
and the ubiquitin moiety. They also suggest that K6-linked ubiqui- 
tin chains, if they occur on H2AK15, are incompatible with 53BP1 
binding. However, as most other lysine residues of ubiquitin project 
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Figure 3 | Multivalent recognition of NCP-ubme by the 53BP1 UDR. 

a, UDR-ubiquitin interaction interface (left). Enlarged view of UDR-acidic 
patch interaction site coloured according to coulombic surface charge 
(right). b, Bio-layer interferometry traces of GST-53BP1 variants at a 

single concentration binding to immobilized NCP-ubme. WT, wild-type 
GST-53BP1 protein. c, Immunofluorescence analysis of U2OS cells depleted 
of 53BP1 by siRNA and transfected with GFP-53BP1(1220-1631) variants, 
1h post-irradiation (2 Gy). A representative cell is shown. Scale bar, 5 \1m. 
Quantification of the percentage of cells with more than 10 GFP (53BP1) 
foci colocalizing with ~H2AX is shown as mean + s.e.m. (n = 4). d, Bio-layer 
interferometry traces of GST-53BP1 with NCP-ubme containing proposed 
UDR-interacting H2A and H2B variants. WT, wild-type histone proteins 

e, Bio-layer interferometry traces of proposed acidic patch-interacting 
GST-53BP1 UDR variants with NCP-ubme. NB, no detectable binding. 


away from the NCP-ubme-GST-53BP1 complex (Extended Data 
Fig. 8f), the 53BP1-NCP-ubme interaction may be permissive to 
H2AK15 poly-ubiquitination. 

The acidic patch acts as a nexus for protein binding to the 
nucleosome’*!>!®!8-19.2728. 53BP1 also takes advantage of this 
surface (Fig. 3a, right and Extended Data Fig. 9a), and its associa- 
tion to NCP-ubme can be outcompeted by the acidic patch-binding 
LANA peptide’? (Extended Data Fig. 9b). Removal of the negative 
charge in the acidic patch, through H2A(E61A, E91A and E92A) 
and H2B(E105A) mutations, reduced GST-53BP1 binding to NCP- 
ubme (Fig. 3d and Extended Data Fig. 9c, d). The C-terminal portion 
of the UDR is highly basic (Fig. 1d) and, in the modelled UDR, 
residues within the C-terminal a-helix are positioned to interact 
directly with the acidic patch (Fig. 3a, right). Mutation of any of these 
residues (K1626A, R1627A, K1628C and K1629A) reduced binding 
of GST-53BP1 to NCP-ubme (Fig. 3e and Extended Data Fig. 9e, f). 
In particular, the 53BP1(R1627A) mutation, which also disables 
recruitment to DSB sites!°, had the most pronounced effect on binding, 
suggesting that this residue acts as the ‘arginine-anchor’* (Extended 
Data Fig. 9f, g). Mutation of UDR contact elements immediately 
adjacent to the acidic patch, Lys108 and His109 of H2B (Extended 
Data Fig. 9a), also decreased the affinity of GST-53BP1 for NCP-ubme 
(Fig. 3d), emphasizing the role of this interaction surface for 53BP1 
binding to chromatin. 

The cryo-EM density above the methylated H4 tail was weaker than 
that of the rest of the NCP-ubme-GST-53BP1 structure (Fig. 4a, b) 
and did not readily fit the crystal structure of the 53BP1 tandem Tudor 
domain!’. Different structures for this region were obtained from 
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Figure 4 | Flexible association of 53BP1 tandem Tudor domain with 
NCP-ubme. a, Magnified view of GST-53BP1 tandem Tudor density 

over the methylated tail of histone H4. Structure of 53BP1(1484-1601) 
and H4 peptide (Protein Data Bank accession code 11G0)"' was fitted. 

b, Indicated sections through the NCP-ubme-GST-53BP1 structure (left), 
from the calculated map (middle); or coloured according to resolution 
(right), scale bar, 25 A. c, Bio-layer interferometry traces comparing 
GST-53BP1(11617A) binding to NCP-ubme or equal concentrations 

of H4Kc20 dimethylated or unmethylated peptides. A single 
GST-53BP1(11617A) association concentration (21M) is shown. 


distinct three-dimensional (3D) classes during image processing, with 
density in the shape of ‘stem and blossom features that we attribute to 
the site of engagement of H4Kc20me2 (Extended Data Fig. 10a). In all 
3D classes, the H4 tail projects upwards, away from the NCP surface, 
where it engages the methyl-lysine binding pocket of the tandem Tudor 
domain (Fig. 4a). As the GST-53BP1(11617A) protein, which is unable 
to bind to H2AK15ub (Fig. 3b), binds to both a short dimethylated 
H4 peptide and to NCP-ubme with comparable affinity (Fig. 4c and 
Extended Data Fig. 10b, c), we surmise that the tandem Tudor domain 
interacts primarily with a short patch of the H4 tail independently of 
other nucleosomal elements. 

Finally, we tested binding of a 53BP1 fragment that harboured its 
native oligomerization domain instead of GST to NCP-ubme variants. 
Each of the elements of NCP-ubme necessary for GST-53BP 1 binding 
is similarly necessary for binding to the native 53BP1 oligomers 
(Extended Data Fig. 10d). These results indicate that the binding 
mode described for the GST-53BP1-NCP-ubme structure accurately 
represents the binding of 53BP1 to nucleosomes. 

In summary, the structure of 53BP1 bound to a methylated and 
ubiquitinated nucleosome highlights how a highly specific biological 
response, such as the recruitment of 53BP1 to DSB sites, is built from 
the multivalent engagement of histone post-translational modifications. 
Far from being the consequence of independent binding events to 
modified histone residues, the binding of 53BP1 to NCPs requires 
the coordinated involvement of multiple nucleosomal elements. We 
also speculate that the intimate association of the 53BP1 dimer with 
the nucleosome, coupled to its pincer-like binding mode that involves 
both faces of the nucleosome, may play a key role in stabilizing the 
nucleosomal barrier that inhibits DNA end resection”. 
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METHODS 


Data reporting. No statistical methods were used to predetermine sample size. The 
experiments were not randomized. The investigators were not blinded to allocation 
during experiments and outcome assessment. 

Plasmids. Bacterial expression vectors of Xenopus laevis histone H3, H3(C110A) 
and H4(K20C) are described in ref. 10. pET28a synthetic human H2A.1 and 
pET28a human H2B.1 were a gift from Joe Landry (Addgene plasmids numbers 
42634 and 42630, respectively). Expression plasmids for HisTEV-ub pProEx-6 x 
His-TEV-ubiquitin and HisTEV-ub G76C pProEx-6 x His-TEV-ubiquitin G76C 
were described in refs 10, 31. Plasmids encoding UBA1, UbcH5a and RNF168)_113 
and pcDNA5-NLS-GFP-53BP1 190-1631 were described previously!®. pUC57-8 x 
145 bp Widom-601 was a gift of Curt Davey, Nanyang Technical University. 

The human 53BP1 DNA sequence corresponding to residues 1220-1631 was 

codon-optimized for bacterial expression by GeneArt (ThermoFisher). The native 
recruitment region, 53BP1(1220-1631), was amplified and cloned into a modified 
pETM-30-02 GST-TEV vector described in ref. 10. DNA for the tandem Tudor 
and UDR segment of 53BP1 (corresponding to residues 1484-1631) was amplified 
from the codon optimized sequence by PCR and inserted into a modified pET24b, 
GST-expressing vector with a C-terminal 6 x His tag, creating the GST-53BP1 
expression vector. All mutations described in the manuscript were introduced by 
site-directed mutagenesis (Quickchange; Stratagene), or via direct DNA synthesis 
in gBlocks (IDT Technologies) and subsequent sub-cloning. All plasmids were 
sequenced for verification. 
Protein production. RNF168(1-113), UbcH5a, and Ubal were expressed 
and purified as described previously’®. In order to remove the 6 x His tag from 
RNF168(1-113) and UbcH5a, the pure proteins were cleaved with 6 x His-TEV 
protease at room temperature for 4h in TEV cleavage buffer (50mM Tris-Cl 
pH 8, 150mM NaCl, 2mM Na-Citrate, 4mM 8-mercaptoethanol) and Ni?*+ 
affinity subtracted (HiTrap chelating resin, GE healthcare) to remove 6 x His-TEV 
protease and uncleaved 6 x His tagged protein. Recombinant histones were purified 
essentially as described’. 

GST-53BP1 was produced in E. coli BL-21 DE3 CodonPlus cells, after induction 
with 200|1M isopropyl 3-p-1-thiogalactopyranoside (IPTG) for 18h at 16°C. 
Cells were lysed by sonication and lysozyme treatment in recombinant protein 
buffer (25 mM Na-phosphate buffer pH 7.4, 300 mM NaCl, 0.1% Triton (v/v), 10% 
glycerol (v/v), 4mM 8-mercapthoethanol, 1 x protease inhibitor mix (284ng ml! 
leupeptin, 1.37 1g ml“! pepstatin A, 170 1g ml“! phenylmethylsulfonyl fluoride and 
330g ml“! benzamindine, 51g ml“! DNasel). Clarified lysate was applied to 
a column of Fast-Flow glutathione sepharose (GE healthcare). After extensive 
washing in high (350mM NaCl) and low (50mM NaC)) salt buffers, the protein 
was eluted with 25mM reduced glutathione in 25 mM Tris-Cl pH 8, 150 mM NaCl, 
10% glycerol (v/v), 4mM (-mercaptoethanol. The eluate was concentrated with a 
30K MWCO centrifugation device (Amicon) and further purified on either S200 
10/300 (GE Healthcare) or SEC650 Enrich (BioRad) gel filtration columns in GF 
buffer (25 mM Tris-Cl pH 7.5, 125mM NaCl, 2mM DTT, 5% glycerol (v/v)). All 
GST-53BP1 variants, including Tudor, UDR mutants and GST-34-53BP1, eluted 
primarily as dimeric species on size-exclusion chromatography. Protein-containing 
fractions were concentrated, flash-frozen in liquid nitrogen and stored at —80°C. 
For isothermal titration calorimetry (ITC) experiments, proteins were dialysed 
extensively into ITC buffer (20 mM HEPES pH 7.5, 200mM KCl, 2mM Tris 
(2-carboxyethyl)phosphine (TCEP), 0.5 mM EDTA). 

53BP1(1220-1631) was purified by glutathione affinity chromatography 
as described above. GST-TEV-53BP1(1220-1631) was cleaved from the resin 
overnight using TEV protease cleavage in TEV cleavage buffer. The eluate was 
concentrated with a 30K MWCO centrifugation device (Amicon) and further 
purified on a SEC650 Enrich (BioRad) gel filtration columns in GF buffer. Multiple 
peaks containing primarily 53BP1(1220-1631) were observed. The main protein 
containing peak was collected, concentrated, flash-frozen in liquid nitrogen and 
stored at —80°C. 

HisTEV-ub variants were produced in E. coli BL-21 DE3 CodonPlus cells, lysed 
in recombinant protein buffer, with the NaCl adjusted to 500 mM. Clarified cell 
lysate was loaded onto a HiTrap chelating column (GE Healthcare) pre-loaded 
with Ni** ions. After extensive washing, HisTEV-ub was eluted using a gradient 
of imidazole and peak-protein containing fractions were concentrated using a 3K 
MWCO centrifugation device (Amicon). HisTEV-ub was further purified on a 
S75 10/300 column in GF buffer. Protein-containing fractions were either con- 
centrated by centrifugation or dialysed into water supplemented with 1 mM acetic 
acid before lyophilization. 

All protein concentrations were determined via absorbance at 280 nm using 
a NanoDrop 8000 (Thermo Scientific), followed by SDS-PAGE and InstantBlue 
(Expedeon) staining with comparison to known amounts of control proteins. 
H4KC20 labelling. Cysteine-engineered histone H4(K20C) protein was alky- 
lated essentially as described previously!**. Briefly, pure histone H4 was 
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reduced with DTT before addition of a 50-fold molar excess of (2-chloroethyl) 
dimethylammonium chloride (Sigma-Aldrich). The reaction was allowed 
to proceed for 4h at room temperature before quenching with 5mM 
8-mercaptoethanol. The H4 protein was separated and desalted using a PD-10 
desalting column (GE Healthcare) pre-equilibrated in water supplemented with 
2mM §-mercaptoethanol and lyophilized. Correct incorporation of alkylation 
agents was assessed by 1D intact weight ESI mass spectrometry (AIMS, 
Department of Chemistry, University of Toronto; see Extended Data Fig. 1b, 
additional details below). 

Catalytic ubiquitination of H2A. H2A was ubiquitinated within a H2A-H2B 
dimer, the minimal complex required for K15 specificity'*. The ubiquitination 
reaction was performed in ubiquitination reaction buffer (50 mM Tris-Cl pH 7.5, 
120mM NaCl, 5mM MgCh, 141M ZnCl, 5mM ATP, 1 mM DTT) with 120nM 
Ubal (E1), 500nM UbcH5a (E2), 500nM RNF 168.113 (E3), 12.5,.M HisTEV-Ub 
and 4\1M H2A-H2B at 30°C for 90 min. The reaction was stopped either by 
addition of SDS loading buffer or an equal volume of quench buffer (9 M urea, 
20 mM Tris-Cl pH 8, 80mM NaCl, 2mM EDTA). 

The large-scale preparation of catalytically ubiquitinated H2A (Ub-H2Acar) 
was purified via loading of the urea-quenched reaction mix on a HiTrap SP column 
(GE Healthcare) using a peristaltic pump. The loaded column was connected to 
a FPLC (GE Healthcare) and washed with urea buffer A (7 M urea, 50mM NaCl, 
20mM Tris-Cl pH 6.8, 2mM B-mercaptoethanol), and bound histones were eluted 
using a gradient of urea buffer B (7 M urea, 1 M NaCl, 20 mM Tris-Cl pH 8.8, 2mM 
8-mercaptoethanol). HisTEV-H2AK15ub-containing fractions were pooled and 
enriched over a HiTrap chelating column (GE Healthcare) pre-loaded with Ni?* 
ions. Extensive washing in urea Ni?* buffer (5.6 M urea, 20mM Tris-Cl pH 8, 
400mM NaCl, 15 mM imidazole, 2mM {-mercaptoethanol) removed non- 
specifically bound unmodified histones. HisTEV-H2AK15ub was eluted in urea 
Ni’+ buffer, supplemented with 300 mM imidazole. The Ni?* column eluate was 
diluted to a final concentration of 1M urea, 200 mM NaCl and 55mM imidazole 
and the 6 x His tag at the N terminus of ubiquitin was removed by TEV cleavage 
overnight at 4°C. TEV protease and uncleaved HisTEV-H2AK15ub was removed 
by Ni? subtraction and the resulting flow through was dialysed extensively in 
water containing 2mM (-mercaptoethanol, before lyophilization. H2AK15ub was 
refolded into histone octamers at an equimolar ratio to the other core histones, 
essentially as described previously. 

Chemical ubiquitination of H2A. Mutant H2A engineered with one cross- 
linkable cysteine (H2(K15C)) was ubiquitinated by cross-linking alkylation, 
essentially as described”!4, with the following modifications. H2A(K15C) (final 
concentration 700 1M) was incubated with HisTEV-ub(G76C) (final concen- 
tration 350 1M) in 250 mM Tris-Cl pH 8.6, 8M urea and reduced with 5mM 
TCEP (Sigma-Aldrich) for 30 min at room temperature. The bi-reactive cysteine 
cross-linker, 1,2-dibromoacetone (DBA, Santa Cruz), was dissolved in dimethyl 
formamide (DMEF) and added to the protein mix to a final concentration of 4.2 mM. 
The reaction was allowed to proceed on ice for 1h before quenching with 5mM 
8-mercaptoethanol and reducing the pH to pH 8 with trifluoracetic acid (TFA). 
Chemically ubiquitinated H2A (H2AKc15ub) was purified as described for the 
catalytic H2A ubiquitination (see also Extended Data Fig. 5a-c). H2AKc15ub was 
refolded into histone octamers at an equimolar ratio to the other core histones, 
essentially as described previously*. 

Nucleosome reconstitution. Nucleosome core particles (NCPs) were reconstituted 
essentially as described previously’®**. Briefly, the four core histones (with or 
without modifications) were resuspended in a guanidine hydrochloride denaturing 
buffer (20 mM Tris-Cl pH 7.5, 7M guanidine-HCl, 10mM DTT), mixed at 
equimolar ratios and then dialysed into a refolding buffer (15 mM Tris-Cl 
pH 7.5, 2M NaCl, 1mM EDTA, 5mM 8-mercaptoethanol) to promote folding into 
a histone octamer. Correctly folded octameric histones complexes were isolated by 
size exclusion chromatography on a $200 GL 10/300 (GE Healthcare). 

Large-scale quantities of pUC57 8 x 145 bp Widom-601 DNA were isolated 
using multiple rounds of MaxiPrep kit purifications (Qiagen). Widom 601 
145 bp DNA was purified as described previously” from the pUCS57 8 x 145bp 
601-sequence using EcoRV restriction enzyme to digest the DNA into fragments. 
Octamers were wrapped using a gradient dialysis technique with Widom-601 
145 bp DNA in Rb-low buffer (10 mM Tris-Cl pH 7.5, 200 mM KCl, 1mM EDTA, 
1mM DTT). Proper assembly of wrapped NCPs, including all H2A and H2B 
mutants, was analysed by native PAGE as described previously*’, and stained 
with SYBR green dye (ThermoFisher, for example, see Extended Data Fig. 1h). 
For uncropped original gels, see Supplementary Fig. 1. NCP-ubme was further 
purified by differential PEG precipitation!’, by incubation with 5% (w/v) PEG- 
6000 for 10 min on ice, followed by centrifugation at 10,000g at 4°C for 10 min. 
The resulting pellet was resuspended in Rb-low buffer. 
53BP1-NCP-ubme complex formation. A complex of GST-53BP1 and NCP- 
ubme was created by incubating the constituent reagents at a 2.5:1 molar ratio. 
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The complex was purified by differential PEG precipitation, essentially as 
described)”. Briefly, NCP-ubme (final concentration 201M) was incubated with 
GST-53BP1 (final concentration 501M) for 10 min on ice. PEG-6000 was added 
to a final concentration of 5% (w/v) and a precipitate allowed to form for 10 min on 
ice. The PEG precipitate was centrifuged at 10,000g for 10 min. The NCP-ubme- 
GST-53BP1 pellet was resuspended in Rb-low buffer and loaded onto a SEC650 
Enrich (BioRad) gel filtration column, pre-equilibrated in Rb-low buffer. Fractions 
eluting earlier than NCP-ubme and GST-53BP1 alone were concentrated to 
>5 mg ml! and stored at 4°C for downstream analysis. 

Size-exclusion chromatography multi-angle light scattering (SEC-MALS). 
SEC-MALS was performed on a Wyatt DAWN TREOS device connected to an 
Agilent Affinity FPLC using a 30NS prepacked column (Wyatt). The column was 
pre-equilibrated in Rb-low buffer and 8011 of 15|.M of protein solution (either 
NCP-ubme, GST-53BP1 or NCP-ubme-GST-53BP1) was autoloaded onto the 
column. Data processing was performed using ASTRA software (Wyatt). 
Isothermal titration calorimetry (ITC). ITC measurements were performed 
on a microCal ITC-200 (GE Healthcare) at 13°C. Prior to the experiment, all 
proteins were extensively dialysed against ITC buffer. The cell initially contained 
~200 jul of NCP-ubme (H2AKc15ub) at a concentration of 23.5 1M in ITC buffer. 
GST-53BP1 was added to the syringe at a concentration of 329 1M (14-fold molar 
excess of the NCP-ubme cell concentration) and the protein was delivered as two 
injections of 0.5 il, followed by 24 injections of 1.5 1, with 3-min intervals between 
injections. Control experiments were performed under identical conditions to 
determine enthalpy changes occurring in the cell upon addition of GST-53BP1 to 
buffer alone. Curve fitting was performed in Origin 7 software (GE Healthcare), 
using a standard one-site binding model and using the final four data points for 
baseline subtraction. 

Biotin-labelling of NCP-ubme. NCP-ubme complexes were labelled on random 
surface lysines with biotin by using a 1:2 ratio of NCP to EZ-Link NHS-PEG4- 
biotin (Thermo-Scientific) in labelling buffer (20 mM HEPES pH 8, 150 mM 
NaCl, 1mM DTT, 0.5mM EDTA). The mixture was incubated for 1h at room 
temperature, before buffer exchange in Rb-low to remove unreacted NHS 
ester using Zeba spin 7K desalting columns (Thermo Scientific). The extent of 
biotinylation was assessed by binding to streptavidin biosensors (Pall ForteBio), 
and equal response levels of loading were used in all experiments. Biotinylation 
did not measurably disassemble NCPs or effect binding to GST-53BP 1 (Extended 
Data Fig. 1h, and data not shown). 

Bio-layer interferometry (Octet) assays. Biotinylated NCPs or biotinylated 
H4 peptides were immobilized on streptavidin biosensors, until reaching a 
threshold binding of 0.8 nm or 1.5nm (1.5 nm used for Fig. 4c only), using an 
Octet RED96 system (Pall ForteBio). Streptavidin biosensors (Pall ForteBio) were 
pre-equilibrated in Modified Kinetics buffer (1x phospho-buffered saline (PBS), 
0.025% Tween-20 (v/v), 0.02% (BSA) (w/v), 0.1mM DTT) for 10 min at room 
temperature. Experiments were set up in 96-well tray format, using a common 
protocol: 60s equilibration in Modified Kinetics buffer, loading of NCP/peptide 
to threshold value, 60s wash in Modified Kinetics buffer, 180s association in 
GST-53BP1 ligand solution matched to Modified Kinetics buffer, followed by 180s 
dissociation in Modified Kinetics buffer. Trays were shaken at 1,000 r.p.m.during 
the experiment and all experiments were performed at 30°C. Pilot experiments 
were performed for NCP-ubme and GST-53BP1 mutants to determine the optimal 
concentrations for kinetic analysis. The assay was performed on four identically 
loaded sensors, dipped into four wells with consecutive twofold dilutions of 
ligand (GST-53BP1 variants or 53BP1(1220-1631)). All dilutions were prepared 
in Modified Kinetics buffer. 

All data were normalized to baseline and subtracted from minimal non-specific 
binding, based on a blank sensor incubated with the highest concentration of 
ligand. All data were analysed in Forte Bio Octet analysis software. Full binding 
kinetics experiments showed a dose-response behaviour correlated with dilution 
series. Due to the complexity of 53BP1 binding to NCP-ubme, full kinetic fitting 
was not possible. All experiments were repeated at two immobilization densities, 
with similar results obtained. Plots of a single ligand concentration (0.5 \1M GST- 
53BP1 variants in all figures except Fig. 4c) were prepared in Prism (GraphPad). 
Fold differences of mutants compared to wild-type were determined at a single 
time point (180 s) across multiple different GST-53BP1 variant concentrations, 
in the association portion of the binding event. 

NCP pull-down assays. Pull-down assays were performed essentially as 
described". Briefly, 41g of GST-tagged 53BP1 constructs were immobilized on 
glutathione-sepharose resin (GE Healthcare), before incubation with 2.2 1g of 
NCP complex in pull-down buffer (50 mM Tris-Cl (pH 7.5), 150mM NaCl, 1mM 
DTT, 0.05% NP-40 (v/v), 0.1% (w/v) BSA). Pull-downs were washed thoroughly 
in pull-down buffer and resuspended directly in 2x SDS loading buffer. For the 
LANA peptide competition assay, the pull-down was performed as normal, with 


the addition of LANA peptide during the incubation with NCP. All pull-down 
assays were repeated two times, with a single immunoblot displayed. 
Immunoblotting. Proteins were separated by SDS-PAGE and transferred to nitro- 
cellulose membranes. All blocking and anibody incubations were performed in 
Tris-buffered saline containing either 5% (w/v) BSA or 5% (w/v) skimmed milk 
powder. For western blotting the following commercial primary antibodies were 
used: rabbit anti- H2B (Abcam), rabbit anti-H3 (Abcam), rabbit anti-GST (Santa 
Cruz). A rabbit polyclonal H2A antibody was raised against a peptide encompassing 
human H2A residues 100-130 (KVTIAQGGVLPNIQAVLLPKKTESHHKAKGK) 
coupled to KLH (Covance). Serum from a single immunized mouse serum 
was found to specifically recognize histone H2A (validation of the antibody 
is the Supplementary Fig. 2). HRP-conjugated goat anti-rabbit IgG (Jackson 
Immunoresearch) secondary antibodies were used with enhanced chemilumi- 
nescence solution (ECL supersignal, Thermo Scientific) was used for protein 
detection. All pull-down assays were repeated two times, with a single immunoblot 
displayed. For uncropped original gels, see Supplementary Fig. 1. 

Protein cross-linking. NCP-ubme with H2B cysteine mutations (N84C or E105C) 
were assembled as previously described, except using a H3 where the sole native 
cysteine was mutated (H3(C110A)). NCPs were desalted using Zeba spin desalting 
columns (Thermo Scientific) to remove reducing agents into degassed cross- 
linking buffer (20 mM Tris-Cl (pH 6.8), 100 mM NaCl, 1mM EDTA). NCPs were 
then incubated in a 1:5 ratio with the bifunctional maleimide bismaleimidoethane 
(BMOE) for 1h at room temperature. BMOE-conjugated NCPs were again desalted 
and quantified. A final concentration of 1.5}1M BMOE-NCP-ubme was mixed with 
a sixfold molar excess of freshly desalted GST-53BP1 cysteine mutants (T1609C or 
K1628C). The reaction was incubated for 2h at room temperature and quenched by 
addition of SDS loading buffer supplemented with 25 mM }-mercaptoethanol. The 
extent of crosslinking was assessed by immunoblotting with anti-H2B antibodies. 
Mass spectrometry. In vitro ubiquitination reaction products of H2A and H2B 
(4\1g of each) were separated on a 15% SDS-PAGE gel and bands were excised 
at the height of the non-modified, mono- and di-ub forms (based on molecular 
size markers run in parallel). Proteins were in-gel digested at 37°C for 330 min 
using 100 ng trypsin (Worthington), and gel-extracted using TFA-acidification and 
acetonitrile dehydration. Peptides were cleaned up using C18 stage tips (Thermo 
Scientific) and dried to completeness. Peptides were reconstituted in 5% (v/v) 
formic acid and loaded onto a 12cm fused silica column with a pulled tip that 
was packed in-house with 3.5 1m Zorbax C18 (Agilent Technologies). Samples 
were analysed using an Orbitrap Elite (Thermo Scientific) coupled to an Eksigent 
nanoLC ultra (AB SCIEX). Peptides were eluted from the column using a 90 min 
linear gradient from 2% to 35% (v/v) acetonitrile in 0.1% (v/v) formic acid. Tandem 
MS spectra were acquired in a data-dependent mode for the top 10 most abundant 
multiply charged peptides, with a dynamic exclusion duration of 20s. Tandem 
MS spectra were acquired using collision-induced dissociation. Mascot was used 
to search spectra against the human Refseq_V53 database, allowing up to two 
missed cleavages and including GlyGly (K) and LeuArgGlyGly (K) as variable 
modifications. Fragmentation spectra of diGly-peptides identified by Mascot were 
manually verified. 

Determination of the intact mass of histone H4 peptides and H4 protein 
was performed at AIMS Mass Spectrometry facility, Department of Chemistry, 
University of Toronto. Electrospray ionization (ESI) mass spectra were acquired 
in positive ion mode using a 6538 UHD model quadrupole time-of-flight mass 
spectrometer equipped with a 1260 Infinity Series HPLC (Agilent Technologies, 
Santa Clara, CA). Intact proteins were mass analysed following online de-salting 
using a Tricorn 5/50 column packed with Sephadex G-25 size-exclusion media (GE 
Healthcare). The mobile phase was 1:1 (v/v) 0.1% aqueous formic acid: methanol, 
flowing at a rate of 500,11 min~!. Samples were diluted in mobile phase to a 
concentration of approximately 0.1 {1M and injections on the column were 2.5 1. 
Peptides. The wild-type LANA-23 peptide (Biotin-LC-MAPPGMR 
LRSGRSTGAPLTRGSY), the non-binding LANA,-23 8LRS10 (Biotin-LC- 
MAPPGMRAAAGRSTGAPLTRGSY) and the biotinylated H4K20C}2_27 peptide 
(Biotin-LC-KGGAKRHRCVLRDNIQ) were synthesized by BioBasic. Peptides 
were modified to create dimethyl-lysine analogues (using (2-chloroethyl)- 
dimethylammonium chloride (Sigma-Aldrich)) or lysine analogues (using 
(2-bromoethyl)-ammonium bromide (Millipore)) as described for histone H4. 
Peptides were purified from the reactant materials by SepPak C18 columns (Cell 
Signaling) and lyophilized dry, before resuspension in DMSO for downstream 
assays. Correct incorporation of alkylation agents was assessed by intact weight 
ESI Mass spectrometry (AIMS, Chemistry Department, University of Toronto, 
Extended Data Fig. 10b, c) 

Cryo-EM grid preparation and microscopy. Holey carbon film-coated EM 
grids were prepared with arrays of 500-800 nm holes by nanofabrication®®. 2.5 11 
of NCP-ubme or NCP-ubme-GST-53BP1 complex was diluted to a final salt 
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concentration of ~50 mM. The low-salt complexes were applied to grids and 
allowed to equilibrate for 5s in a FEI Vitrobot grid preparation robot, and blot- 
ted from both sides for 15s before freezing in a liquid ethane/propane mixture 
(1:1 v/v)°°. Grids were subsequently stored in liquid nitrogen, before transfer 
to a Gatan 626 cryotransfer specimen holder. Samples were imaged with a FEI 
F20 electron microscope, equipped with a field emission gun and operating at 
200 kV. Movies were acquired with a Gatan K2 Summit direct detector device 
(DDD) camera using a calibrated magnification of 34,483 x, resulting in a physical 
pixel corresponding to 1.45 A. The DDD was used in counting movie mode with 
5 electrons per pixel per s for 15s and 0.5s per frame. This exposure rate resulted 
in 1.2 electrons per A? per frame and a total exposure of 36 electrons per A? on the 
specimen. A total of 227 movies were acquired for the NCP-ubme with a defocus 
range from 1 to 3.2\1m. 319 movies were acquired for the NCP-ubme-GST-53BP1 
complex with defocus between 0.8 and 4.1 jum. NCP-ubme, and NCP-ubme-GST- 
53BP1 data sets were treated similarly. 

Cryo-EM image analysis. Individual frames in a movie stack were aligned and 
averaged using the programs alignframes_Imbfgs and shiftframes*’. Contrast 
transfer function (CTF) parameters were calculated from the averaged frames 
using CTFFIND3**. Manual inspection of micrographs and their corresponding 
power spectra was performed in Relion 1.3 (ref. 39) and undesirable micrographs 
were discarded due to contamination or lack of high resolution Thon rings in their 
power spectrum. Automatic particle picking, based on manually selected templates, 
was performed in Relion 1.3. After particle extraction, beam induced particle 
motion between frames was corrected with alignparts_Imbfgs*”. A previously 
measured 2% magnification anisotropy was corrected as described*’. Extracted 
particles were subject to 2D classification in Relion 1.3 and classes with averages 
that resembled the expected projections of NCPs were selected for 3D classification. 
A low-pass filtered model of NCP based on Protein Data Bank accession code 
1KXS (ref. 20) was used as a template for 3D classification into 9 classes (NCP- 
ubme and GST-3BP1) or 4 classes (NCP-ubme). Particle images from 3D classes 
that showed high-resolution features were refined further. Refined maps of NCP- 
ubme-GST-53BP1 were sharpened in Relion 1.3 with an automatically determined 
B-factor. NCP-ubme maps were not sharpened. Global resolution estimates were 
determined using the FSC = 0.143 criterion after a gold-standard refinement*’. 
Local resolution was estimated with ResMap*’. Calculations with Relion 
1.3 were performed using the SickKids High Performance Facility (Hospital 
for Sick Children, Toronto). All programs used are freely available through the 
respective cited distributors. 

Structure editing and modelling. The atomic models of Widom-601 DNA 
(Protein Data Bank accession code 3LZ0)'4, octameric histones (Protein Data Bank 
accession code 1KX5) °, ubiquitin (Protein Data Bank accession code 1UBI)” 
and H4K20me2/53BP1 tandem Tudor domain (Protein Data Bank accession code 
21G0)'' were fitted in to the 3D maps using UCSF Chimera* without allowing 
flexibility. Map segmentation was performed in UCSF Chimera. In Fig. 2a, den- 
sity corresponding to ubiquitin in the NCP-ubme structure was displayed with a 
threshold of 0.125. The rest of the NCP-ubme and NCP-ubme-GST-53BP1 are 
displayed with a threshold of 0.29 and 0.395, respectively. The H2A/H2B sequence 
was mutated to the human H2A(K13R and K36R) and H2B manually in UCSF 
Chimera. A polyalanine model of the UDR was built within the UDR density in 
Coot, which compared well to predicted structures generated by Robetta*’. The 
UDR model was mutated and fitted in UCSF Chimera, followed by iterative rounds 
of real-space refinement in PHENIX“ and model optimization in Coot. All figures 
were prepared in UCSF Chimera. 

Cell culture. U-2-OS (U2OS) cells were purchased from ATCC and verified 
mycoplasma free. Cells were cultured with McCoy’s Medium (Gibco) supplemented 
with 10% fetal bovine serum and maintained at 37°C in 5% CO) atmosphere 
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conditions. Knock-down by single duplex siRNA to 53BP1 (ThermoFisher, 
D-003548-01, target sequence: 5’-GAGAGCAGAUGAUCCUUUA-3’) was 
performed with RNAiMAX (Invitrogen) two days before fixation, following 
the manufacturers’ instructions. One day before fixation, cells were transiently 
transfected with the plasmid NLS-GFP-53BP1(1220-1631) DNA using 
Lipofectamine-3000 transfection reagent (Invitrogen). 

Immunofluorescence microscopy. One hour after exposure to 2 Gy of ionizing 
radiation cells were fixed with 4% (w/v) formaldehyde. Fixed cells were permea- 
bilized in PBS, 0.3% (v/v) Triton X-100 and blocked in blocking buffer (PBS, 10% 
(v/v) goat serum, 0.5% (v/v) NP-40, 0.5% (w/v) Saponin). Coverslips were stained 
with anti-yH2AX (Millipore) and anti-mouse IgG Alexa-fluor 647 (Millipore) 
secondary antibody. DNA was counterstained with DAPI, which was used to trace 
the outline of nuclei. Stained cells were visualized on a LSM780 Zeiss confocal 
microscope. Quantification was performed on 100 U20OS cells (1 = 3), in which a 
cell with >10 GFP-53BP1 foci was considered positive. 
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Extended Data Figure 1 | Generation of homogenously methylated 
and ubiquitinated NCPs. a, Schematic of H4 cysteine alkylation to create 
a dimethyl-lysine analogue. b, 1D intact mass spectra of the alkylated 
H4Kc20me?2 protein after desalting and lyophilization. c, Mass spectrum 
of the identified off-target K36 ubiquitination. Fragmentation spectrum 
of the 37-K(GlyGly)GNYAER-43 H2A peptide (476.237091 Da, 

2* charge state, Mascot ions score: 45). This spectrum originates from 
di-ubiquitinated forms of in vitro ubiquitinated H2A that were separated 
by SDS-PAGE, subjected to limited trypsin digestion and analysed by 
tandem mass spectrometry. d, Immunoblot analysis of a HZA-H2B dimer 
ubiquitination reaction. Comparison of K13R, K15R and K36R triple 
mutated and K13R and K36R double mutated H2A variants 

using optimized conditions to minimize off-target ubiquitination 

for large-scale reactions. e, SDS-PAGE analysis of the first step in 
H2AK15ub purification, cation exchange chromatography (in, input; 


HisTEV-H2AK15ub 
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FT, flow-through; W, wash; M, molecular weight marker). The ubiquitin 
is tagged with an N-terminal hexa-histidine tag and TEV cleavage 

site (termed HisTEV-ub). f, SDS-PAGE analysis of the second step in 
H2AK15ub purification, nickel ion affinity chromatography. g, SDS-PAGE 
analysis of TEV protease cleavage of HisTEV-H2AK15ub and subsequent 
nickel column depletion. Cleaved H2AK15ub flows through (Ni FT) the 
column, while uncleaved product and His-tagged TEV protease binds 

it (not shown). h, Native polyacrylamide gel analysis of wrapped NCPs. 
The gel was stained with SYBR green to identify DNA. Wrapping of 
NCPs results in quenching of the SYBR green signal and a shift in the 
electrophoretic mobility of the DNA. Ubiquitinated NCPs (H2A-ub) 
appear as a doublet, which runs higher than solely methylated NCPs 

(WT H2A). Biotinylation on NCP surface lysines, required for 
downstream bio-layer interferometry analysis, does not measurable effect 
migration in the gel (H2A-ub-Bio). WT, wild-type H2A protein. 
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Extended Data Figure 2 | Formation of NCP-ubme and NCP-ubme- 
GST-53BP1 complexes. a, Schematic representation of full-length human 
53BP1, 53BP1 with the native recruitment region, 53BP1(1220-1631), and 
GST-53BP1(1484-1631) (termed GST-53BP1). 53BP1(1220-1631) and 
GST-53BP1 constructs are used throughout this manuscript. Identified 
domains are highlighted; oligo: oligomerisation domain. b, Pull-down 
assay of GST-53BP1 variants containing either the 16-residue linker used 


throughout this study or a longer 34-residue linker. The L1619A 


mutant was included as a negative control. c, Bio-layer interferometry 


assays of GST-53BP1 or a native 53BP1(1220-1631) fragment. 


UDR 


bio-layer interferometry assays. M, molecular weight markers. 

e, Isothermal titration calorimetry (ITC) measurement investigating the 
affinity of GST-53BP1 (syringe) to NCP-ubme (cell). Data reported as the 
mean +s.e.m. (1 = 2). f, SDS-PAGE analysis of NCP-ubme-GST-53BP1 
complex formation by differential PEG precipitation!” (in, input; S, soluble 
supernatant; P, pellet). An excess of GST-53BP1 was added, which was not 
precipitated with the NCP-ubme. g, SDS-PAGE analysis of size exclusion 
chromatography fractions, isolating NCP-ubme-GST-53BP1. In, input; 
M, size markers. Fractions 8-10 were pooled and used for SEC-MALS 


analysis (Fig. 1a) and subsequent structure determination. 


d, SDS-PAGE of purified 53BP1 proteins used in the pull-down and 
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Extended Data Figure 3 | Cryo-EM structure determination and 
validation of NCP-ubme-GST-53BP1 complex. a, Representative 
cryo-EM micrograph of the NCP-ubme-GST-53BP1 complex. Example 
particle images in different orientations are boxed. b, Power spectrum 
from a representative micrograph showing Thon rings that extend beyond 
7A resolution. c, Example particle images, scale bar corresponds to 25 A. 
d, Examples of 2D class averages obtained during image processing of 
the NCPubme-GST-53BP1 complex (CTF corrected, inverted contrast). 
Scale bar, 25 A. e, Fourier shell correlation curve after a gold-standard 
map refinement. f, Euler angle distribution plot of all particles used for 
the symmetrized final map. Bar length and colour (blue, low; red, high) 
corresponds to number of particle images contributing to each view. 

g, Magnified view of the H2B-H4 cleft with clear side chain density 
observed for H2B Arg89, Gln92 and Arg96 (top left). Magnified view of 


the H2B aC helix and H2A al helix (top right). Densities for aromatic 
side-chains of H2A Tyr50, H2B Tyr121 and the bulky residue H2A Argl7 
are visible. Magnified view of the C terminus of the H2A a2 helix, with 
density for base of the side chain of Arg71 visible (bottom). h, Schematic 
of predicted location of flexible GST moiety (green) used for 53BP1 
dimerization (Protein Data Bank accession code 1Y6E)*’. No cryo-EM 
density can be attributed to GST, suggesting that it is highly flexible 
between different particles in the population sampled. Dashed lines 
indicate the 16 amino acid linker region incorporated in the GST-53BP1 
construct (black) and the flexible C-terminal tail of the GST (green). 
The linker peptide region could span up to ~80 A, allowing substantial 
flexibility of the GST dimers, shown here positioned ~50 A from the 

N terminus of the modelled Tudor domain. 
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Extended Data Figure 4 | Cryo-EM structure determination and 
validation of NCP-ubme complex. a, Surface rendering of the NCP- 
ubme complex, viewed along the DNA axis and the orthogonal direction. 
Density corresponding to ubiquitin was segmented, Gaussian filtered 
and displayed with a threshold of 0.125 (area within dashed line). 

The rest of the NCP-ubme is displayed with a threshold 0.35. Rigid body 
fitting of the high resolution structure of histone octamers (Protein Data 
Bank accession code 1KX5)”°, Widom-601 145bp DNA (Protein Data 
Bank accession code 3LZ0)"* and ubiquitin (Protein Data Bank accession 
code 1UBI)” into NCP-ubme density is shown. Ubiquitin could not 

be readily placed in the attributed density. b, Representative cryo-EM 


LETTER 


NCP-me 
Flexible ub 
H2A 


H2B/H3 


micrograph of the NCP-ubme complex. Example particle images showing 
different orientations are boxed. c, Power spectrum from a representative 
micrograph showing Thon rings. d, A selection of particles images after 
extraction from the data set. Scale bar, 25 A. e, Examples of 2D class 
averages obtained during image processing (CTF corrected, inverted 
contrast); scale bar, 25 A. f, Fourier shell correlation curve after gold- 
standard map refinement. g, Euler angle distribution plot of all particle 
images used for the symmetrized final map. Bar length and colour 
corresponds to number of particle images in each view that contributed 
to final 3D map (blue, low; red, high). 
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Extended Data Figure 5 | See next page for caption. 
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Extended Data Figure 5 | Chemical ubiquitination of H2A and the 
constrained conformation of ubiquitin in the NCP-ubme-GST-53BP1 
complex. a, Schematic of cross-linking reaction scheme between an 
electrophilic acetone (dibromoacetone, DBA) and two engineered cysteine 
residues in H2A and ubiquitin, respectively. TCEP was added to initially 
reduce disulfide bonds. b, Pilot reactions of cross-linkable ubiquitin mixed 
with H2A. Cross-linked products were separated by SDS-PAGE. H2A- 
only and ubiquitin-only reactions identify non-productive cross-linking 

in the final reaction, H2A-H2A and ub-ub. Correctly modified H2A is 
labelled H2AKc15ub. Hexahistadine and TEV consensus sequence-tagged 
ubiquitin was used (HisTEV-ub) c, Chemically ubiquitinated H2AKc15ub- 
containing NCPs interact with GST-53BP1. Immunoblot (IB) analysis 

of GST-53BP1 pull-down (PD) using NCPs assembled with unmodified 
H2A, catalytically ubiquitinated H2A or chemically ubiquitinated H2A 
(H2AKc15ub). In this pilot experiment, the chemically ubiquitinated H2A 
runs with lower mobility due to the retention of the HisTEV tag. 

The tag was removed in all future experiments. d, Space-filling model of 
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the covalently tethered ubiquitin, in a closed conformation, pulled over 
the surface of NCP. Key interacting histone residues are labelled. 

e, Bio-layer interferometry traces of a single concentration of GST-53BP1 
association and dissociation with immobilized NCP-ubme containing 
the indicated mutations in the H2B aC helix. Relative affinities are also 
shown. WT, wild-type H2B. f, Immunoblot analysis of GST-53BP1 pull- 
down assay to determine the effect of mutating the wC helix H2B residues 
that potentially form a hydrogen-bonding network with closed, 53BP1- 
bound ubiquitin. WT, wild-type H2B protein. g, Stained SDS-PAGE gel 
of purified reconstituted, nucleosomes used in this figure. H4Kc20me2- 
modified NCPs containing cross-linked ubiquitin at indicated residues 

in H2A, H2B or both (left). These NCPs were used in assays in panels 

e and f. Biotinylated NCP-ubme complexes containing H2B variants used 
in the bio-layer interferometry assays (right). h, Immunoblot analysis of 
GST-53BP1 pull-down assay investigating the effect of H2BK120ub on 
GST-53BP1 binding to NCP-ubme. 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


a H2A 
wild type 


Binding 


Extended Data Figure 6 | Structural basis of 53BP1 specificity for 
H2AK15ub. a, Diagrammatic representation of the arginine-fingers 
mechanism of 53BP1 recognition of H2ZAK15ub-containing NCPs. 
Sequences of H2A mutations are detailed in c. b, Top view of the H2A 
N-terminal tail, displaying the modelled arginines projecting into the 
nucleosomal DNA grooves. For clarity, only the DNA phosphodiester 
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backbone is shown. c, Immunoblot analysis of pull-down (PD) assay 
with RNF168-ubiquitinated NCPs containing the indicated H2A variants 
detailed at the bottom. GST-53BP1 can recognize H2AK13ub only when 
arginine 17 has been removed, allowing a shift in the N-terminal tail. 
Proposed R/KxxxKubxR consensus binding motif is indicated. 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


a 1604 


LETTER 


b 
P NCP-ubme H2B N84C-BMOE  o« eee 
gas Ubiquitin NCP-ubme H2B E105C-BMOE coe 
fed Oe Ent GST-53BP1 WT ‘ 7 Fs 
i, Oy) ~ me — E105 GST-53BP1 T1609C e ° e 
2 pidae GST-53BP1 K1628C é ‘ ‘ 
‘ . UDR kDa 
Tudor ve H2B-53BP1 
IB:H2B 
(short exposure) 25 
17 
83 H2B-GST-53BP1 
* GST-53BP1 
‘ IB:H2B 25 
(long exposure) 
Mg H2A 17 
__ Ws 
\ 
Sir375-83/ mm 4 
mo DNA 
Mg Sir3 
ub 
53BP1 
UDR 


Extended Data Figure 7 | Specific orientation of the UDR region 

and H2B-H4 cleft interactions. a, Schematic of 53BP1 UDR region, 
indicating sites of engineered cysteines used for BMOE cross-linking 
(purple arrowheads) (top). Surface representation of modelled NCP with 
interaction interfaces coloured; tandem Tudor domain (H4 tail: red), UDR 
(H2B-H4 cleft, H2B aC helix and acidic patch: yellow) and ubiquitin 
(H2B aC helix and H2AK15: purple) (bottom). Locations of engineered 
H2B cysteine residues used for cross-linking are indicated (N84C and 
E105C). b, Immunoblot analysis of covalently cross-linked NCP-ubme- 
GST-53BP1 variants. BMOE, a bivalent maleimide cross-linker, was first 
reacted with NCP-ubme-containing H2B single cysteine variants, before 


incubation with cysteine cross-linkable GST-53BP1 variants. Cross- 
linking to H2B is visualized by a shift in apparent molecular weight, 
equivalent to the addition of one GST-53BP1 moiety. The relatively 
weaker cross-linking of H2B(E105C) and GST-53BP1(K1628C) probably 
arises from the fact that the lysine is predicted to interact on the other face 
of the acidic patch. The asterisk denotes a non-specific band due to cross- 
reactivity of the anti-H2A antibody with non-cross-linked GST-53BP1. 
c, Magnified view of the H2B-H4 cleft at the rear of the NCP, with 
modelled UDR chain in the yellow density. Ribbon structure of Sir3 BAH 
domain (residues 75-83; Protein Data Bank accession code 3TU4)"*, also 
proposed to interact in this region, was overlaid (purple). 
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Extended Data Figure 8 | Validation of the UDR-ubiquitin interaction. 
a, Immunoblot analysis of pull-down (PD) assays, immobilizing the 
indicated GST-53BP1 UDR mutations in residues 1616-1620 and 
monitoring NCP-ubme interaction. WT, wild-type GST-53BP1 protein. 
b, SDS-PAGE of purified GST-53BP1 UDR proteins used in the pull- 
down and bio-layer interferometry assays. M, molecular weight markers. 
c, Pull-down assays of GST-53BP1 with the indicated NCP-ubme variants. 
d, Bio-layer interferometry traces of NCP-ubme prepared with the 
indicated ubiquitin variants chemically ligated to H2A at position 15. NB, 
no binding detected. e, SDS-PAGE analyses of reconstituted, biotinylated 
nucleosomes used in f. f, Enlarged view of the 53BP1-bound constrained 
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modelled ubiquitin, with ball and stick representations of ubiquitin lysine 
residues indicated (left). Note the accessibility of Lys27 and Lys63, both 
reported to be following DSBs. Model of a Lys-3-linked di-ubiquitin 
(Protein Data bank accession code 3H7S)** built on H2AK15 within the 
NCPubme-GST-53BP1 structure (right). The distal ubiquitin (pink) 
projects away from the NCP surface towards the tandem Tudor domain 

of 53BP1, shown in orange. Although we note a minor steric clash between 
the modelled distal ubiquitin and the tandem Tudor domain, we surmise 
that the inherent flexibility of ubiquitin chains, coupled with the flexibility 
of the tethered Tudor domain on the H4 tail, probably enables 53BP1 to 
bind to NCPs with Lys63-linked ubiquitin chains on H2AK15. 
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Extended Data Figure 9 | 53BP1 UDR interactions with the nucleosome 
acidic patch. a, Magnified view of a ribbon representation of the NCP- 
ubme acidic patch with the overlaying density attributed to the UDR. 

b, Immunoblot analysis of a GST-34-53BP1 pull-down assay performed 
with RNF168-ubiquitinated NCP-me, in the presence of the acidic-patch- 
interacting LANA peptide (the GST-53BP1 protein used here has the 34 
amino acid linker). The indicated amounts of LANA peptide were added 
as a competitor during the pull-down (concentration in 1.M). 8LRS10 
peptide has negligible NCP binding’? and used as a control. c, Immunoblot 
analysis of GST-53BP1 pull-downs (PD) using NCP-ubme incorporating 
the indicated H2A and H2B mutants, which localize to the acidic patch 
and adjacent H2B aC helix. d, SDS-PAGE and InstantBlue staining to 
analyse purified reconstituted, biotinylated nucleosomes containing 


Sir3 BAH 


H2A/H2B mutations used in the bio-layer interferometry assays. 

Compare to wild-type NCP-ubme in Extended Data Fig. 5g (right panel). 
e, Immunoblot analysis of GST-53BP1 pull-down assays, using selected 
53BP1 UDR basic residue mutations. WT, wild-type GST-53BP1 protein. 
f, SDS-PAGE and InstantBlue staining to analyse purified GST-53BP1 
UDR variant proteins used in the bio-layer interferometry assays in 

Fig. 3e. M, molecular weight markers. g, Enlarged view of UDR-acidic 
patch interaction site coloured according to coulombic surface charge, 
overlaid with the structure of other acidic patch chromatin binding factors: 
KSHV LANA peptide (Protein Data Bank accession code 1ZLA)””; the Sir3 
BAH domain (Protein Data Bank accession code 3TU4)!° and the PRC1 
complex (Protein Data Bank accession code 4R8P)*». 
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Extended Data Figure 10 | Flexibility of the 53BP1 tandem Tudor 
domain in the NCP-ubme structure and comparison of GST-53BP1 to 
53BP1(1220-1631). a, A selection of aligned 3D maps obtained during 
determination of the NCP-ubme and GST-53BP1 structure, with an 
enlarged view of the density from the tandem Tudor domain of 53BP1, 
shown in the lower panels. Note that the position of the tandem Tudor 
domain density is highly variable, but is always tethered over the H4 
N-terminal tail. b, 1D intact mass spectra of biotin-LC-H42-27 (K20C) 


2.4 


0.0 


0 60 120 180 240 300 360 
Time (sec) 


peptide chemically alkylated to create a lysine mimic. The reaction 
proceeded to near completion, but some unreacted peptide can be 
observed at 2,190 Da. c, 1D intact mass spectra of biotin-LC-H412_27 
K20C, peptide chemically alkylated to create a dimethyl lysine mimic. The 
reaction proceeded to near completion, but some unreacted peptide can 
be observed at 2,190 Da. d, Bio-layer interferometry traces comparing the 
binding of GST-53BP1 with 53BP1(1220-1631) to NCP-ubme variants. 
Data from a single 53BP1 protein concentration is plotted. 
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Structure of the adenosine A», receptor bound to an 


engineered G protein 


Byron Carpenter!, Rony Nehmé!, Tony Warne!, Andrew G. W. Leslie! & Christopher G. Tate! 


G-protein-coupled receptors (GPCRs) are essential components 
of the signalling network throughout the body. To understand the 
molecular mechanism of G-protein-mediated signalling, solved 
structures of receptors in inactive conformations and in the active 
conformation coupled to a G protein are necessary’. Here we 
present the structure of the adenosine Az, receptor (A24R) bound 
to an engineered G protein, mini-G,, at 3.44 resolution. Mini-G, 
binds to A,4R through an extensive interface (1,048 A?) that is 
similar, but not identical, to the interface between G, and the 
(2-adrenergic receptor’. The transition of the receptor from an 
agonist-bound active-intermediate state*° to an active G-protein- 
bound state is characterized by a 14 A shift of the cytoplasmic end 
of transmembrane helix 6 (H6) away from the receptor core, slight 
changes in the positions of the cytoplasmic ends of H5 and H7 and 
rotamer changes of the amino acid side chains Arg*, Tyr*** and 
Tyr’°>. There are no substantial differences in the extracellular half 
of the receptor around the ligand binding pocket. The A24R-mini-G, 
structure highlights both the diversity and similarity in G-protein 
coupling to GPCRs* and hints at the potential complexity of the 
molecular basis for G-protein specificity. 

Structures of A,4R bound to either inverse agonists” or agonists 
have elucidated the molecular determinants of subtype specificity 
and ligand efficacy’. However, the mechanism of activation of the 
receptor to allow G-protein coupling and the basis of G-protein 
selectivity is not fully understood. Structures of Az 4R in the inac- 
tive state have been determined bound to the antagonists ZM241385 
(refs 7-9), XAC’, caffeine’ or 1,2,4-triazines!”, and all the structures are 
very similar. An intramembrane Na‘ ion that can act as an allosteric 
antagonist was identified in the highest resolution structure (1.8 A)}3, 
and a homologous Na‘ ion has been subsequently identified in 
other high-resolution structures of GPCRs!*~'°. Four agonist-bound 
structures of A> 4R have also been determined after co-crystallization 
with either adenosine*, NECA+, CGS21680 (ref. 10) or UK432097 
(ref. 5). All the structures are very similar and are thought to repre- 
sent an active-intermediate conformation of the receptor, but not the 
fully active receptor that binds a G protein*. Observations that support 
this conclusion include the presence of rotamer changes of conserved 
amino acid residues associated with activation of other GPCRs’’, 
and the absence of a large-scale movement of the cytoplasmic end 
of transmembrane helix 6 (H6) away from the receptor core!!. The 
G-protein-coupled state of A2,R exhibits higher binding affinity for 
agonists compared to the uncoupled state'’, but it is unclear whether 
the agonist-bound structures determined so far depict the binding 
pocket in a high-affinity or low-affinity conformation. Therefore, 
in order to elucidate the structure of the activated state of Ax,R, we 
have determined its structure bound to a high-affinity agonist and an 
engineered G protein. 

There is a single reported structure of a GPCR bound to a het- 
erotrimeric G protein, namely G,-bound (2-adrenergic receptor 
(8,AR)°, which showed that virtually all the atomic contacts between 
the receptor and G protein were formed by the Ga subunit. To 


4,5,10 


facilitate the crystallization of any GPCR-G, complex, we developed 
a minimal G protein, mini-G,, that comprised a truncated form of the 
GTPase domain of Ga, and included eight point mutations to stabilize 
the protein in the absence of G+ and in the presence of detergents 
(B.C. and C.G.T., manuscript submitted). In addition, three 
truncations removed the switch II region, 25 amino acids from 
the N terminus and the a-helical domain. Mini-G, reproduced the 
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Figure 1 | Ligand binding and overall structure of the 

A2aR-mini-G, complex. a, The structure of Az4R is depicted as a cartoon 
in rainbow coloration (N terminus in blue, C terminus in red) with 
mini-G, in purple. The agonist NECA bound to A,,4R and GDP bound to 
mini-G, are depicted as space-filling models (carbon, yellow; nitrogen, 
blue; oxygen, red; phosphorous, orange). Relevant secondary structural 
features are labelled. b, Mini-G, increases the affinity of agonist 

binding to A2,R similar to that observed by a heterotrimeric G protein. 
Competition binding curves were performed in duplicate (n = 3) 

by measuring the displacement of the inverse agonist >H-ZM241385 

with increasing concentrations of the agonist NECA (K; values in 
parentheses, see Extended Data Fig. 1 for full data): blue circles, Ax4R 

(Kj 4.6 + 0.3 |1M); orange squares, A>,R and mini-G, (Kj 430 + 80 nM); 
green diamonds, A2R and heterotrimeric G protein with nanobody Nb35 
(K, 340 + 70nM). Error bars represent s.e.m. G proteins were all added to 
membranes containing AR to give a final concentration of 25 1M and the 
final concentration of NaCl was 100 mM (see Methods). c, The structure of 
BAR (green) bound to G, (grey and purple) is depicted as a cartoon in the 
same orientation as AR in a; the purple region in G, corresponds to the 
structure of mini-G,. 
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increase in agonist affinity that occurred upon incubation of the receptor 
in the presence of the heterotrimeric G protein G, and it also showed 
identical sensitivity to the presence of the allosteric antagonist Nat 


Figure 3 | Comparison of the A,,R-mini-G, and 8,AR-G, complexes. 

a, Structural alignment of 8,AR-G, (PDB ID: 3SN6)° and Az,4R-mini-G, 
was performed by aligning the receptors alone; A2,R, rainbow colouration; 
B2AR, grey. The resultant relative dispositions of Ga, (dark grey) bound 

to B2AR and mini-G, bound to A2aR (purple) are depicted. NECA and 
GDP are depicted as space-filling models (carbon, yellow; nitrogen, blue; 
oxygen, red; phosphorous, orange). The a-helical domains of Ga,, 

Gy and Nb35 have all been omitted for clarity. b-e, Detailed comparisons 
of hydrogen bonds (red dashed lines) between the respective G proteins 
and receptors; both receptors are in rainbow colouration, with mini-G, 

in purple and Ga, in grey. Labelling of amino acid residues shows the 
Ballesteros—Weinstein numbers for the receptors and the CGN notation 
for G proteins. f, g, Views of the cytoplasmic surface of A),4R and B,AR, 
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Figure 2 | Packing interactions between Az4R 
and mini-G,. a, Diagram of A,4R depicting 

its secondary structure in the Ay,4R-mini- 

G, structure. Residues shaded in grey are 
disordered in either chain A and/or chain B. 
Disulphide bonds are depicted as pink lines. 

b, Cartoon of the mini-G, topology. c, Diagram 
of contacts between mini-G, and A»4R, with 
line thickness representing the relative number 
of interactions between amino acid residues. 

In all panels, amino acid residues depicted in 
colour are at the interface between mini-G, and 
Ada (within 3.9 A), with colours reflecting 
the properties of the side chain; blue, positively 
charged; red; negatively charged; green, 
hydrophobic; yellow, hydrophilic. 


(Fig. 1 and Extended Data Fig. 1). In addition, mini-G, readily formed 
a complex with A2,R in the presence of the agonist NECA and the 
complex was considerably more thermostable, particularly in 
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respectively, as space-filling models with atoms making contacts with their 
respective G proteins coloured according to their type; carbon, green; 
nitrogen, blue; oxygen, red. Atoms coloured pink comprise conserved 
hydrophobic residues in the core of the receptors against which Arg*°° 
packs. h, Comparison of residues making contacts to G proteins in the 
AoaR-mini-G, complex and the 82AR-G, complex. Amino acid residues 

in the receptors that make contacts are coloured: red, negatively charged; 
blue, positively charged; green, hydrophobic; yellow, hydrophilic. Residues 
in white are those that do not make contact to the respective G protein, but 
the equivalent residue in the other receptor does. Ballesteros—Weinstein 
(B-W) numbers are given for residues in transmembrane a-helices, with 

a dash for residues in loops or H8. Amino acid residues 5.71-5.77 are 
disordered in the Ay,R-mini-G, structure. 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


short-chain detergents, than A24R with only NECA bound (Extended 
Data Fig. 2). This complex was crystallized in the detergent octylthi- 
oglucoside by vapour diffusion, a data set collected from two crystals, 
the structure determined by molecular replacement (see Methods) 
and refined to 3.4A (Extended Data Table 1). Of the two A24R- 
mini-G, complexes per crystallographic asymmetric unit, the den- 
sity in complex AC was better defined and was therefore used for all 
subsequent analyses below (see Supplementary Discussion). 

The A24R-mini-G, complex contained density for the agonist NECA 
bound to Ay,4R and density for a molecule of GDP bound to mini-G, 
(Extended Data Fig. 3). The presence of GDP in the mini-G, structure 
is a reflection of the properties of the engineered G protein, which, 
after complex formation, is insensitive to GIP.;-mediated dissociation 
(B.C. and C.G.T., manuscript submitted). Mini-G, is in a conformation 
virtually identical to that observed in the 8, AR-G, structure (see below) 
and therefore represents an active state of the G protein, consistent with 
its ability to couple to Az4R and induce high-affinity agonist binding 
(Fig. 1). The interface between A24R and mini-G, is formed between 
20 amino acid residues from the receptor and 17 residues in mini-G, 
(Fig. 2, Extended Data Figs 4-6 and Supplementary Table 1), 
comprising a total buried surface area of 1,048 A? on the receptor. In 
mini-G,, contacts are made predominantly by the a5 helix involving 
14 amino acid residues that pack against residues in H3, cytoplasmic 
loop 2 (CL2), H5, H6, H7 and H8 of Az,4R. Additional interactions 
include residues in S1, S3, the S2-S3 loop and a5 that form a hydropho- 
bic pocket in which the side chain Leu110 in CL2 of AzaR is sequestered 
(Extended Data Fig. 7). Amino acid residues in Az,R and mini-G, form 
complementary surfaces that pack together predominantly via van der 
Waals interactions (~90% of contacts) with six polar interactions across 
the interface. Helix «5 protrudes into the cleft within the cytoplasmic 
face of A>4R created through the outward bending of the cytoplasmic 
end of H6. The apex of the a5 helix, Tyr391" (superscript refers to 
the CGN system for G proteins®) makes extensive van der Waals inter- 
actions with Arg102°*” (superscript refers to the Ballesteros- Weinstein 
numbering system for GPCRs!°) that forms the whole upper surface 
of the cleft (Fig. 3). 

Superposition of the receptors in the A2,4R-mini-G, complex and 
the 8,AR-G, complex? shows that the receptors have very similar 
architectures (r.m.s.d. 1.7 A over 1,239 atoms). The intracellular faces 
of the receptors align very well, including the large outward shift of 
the cytoplasmic end of H6 on activation. However, mini-G, does 
not superimpose exactly on the Ga subunit of the heterotrimeric 
G protein bound to BAR (Fig. 3). There is a difference in orientation 
of ~15°, although the difference is smaller (~10°) for the a5 helix. This 
is probably a consequence of the different amino acid residues in AzaR 
compared to 3,AR (Fig. 3 and Extended Data Fig. 5), which results in 
a slightly different packing of the G proteins to the receptors, although 
we cannot discount the possible influence of lattice contacts. Alignment 
of mini-G, with Ga, bound to BAR shows that they are essentially 
identical (r.m.s.d. 0.92 A over 1,158 atoms), with the most substantial 
difference being an 8° tilt between the respective «5 helices, resulting in 
a3.7 A displacement of the C, of Tyr391 in mini-G, away from the core 
of the G protein (Extended Data Fig. 8). Overall, there are 14 contacting 
residues in common between the 38,AR-G, complex and the A24R- 
mini-G, complex, with an additional 6 contacting residues present only 
in A>4R and another ten present only in BAR (Supplementary Table 1). 
Many of the contacts between residues in the «5 helix of the G protein 
and the receptors are conserved, although the exact orientation and 
atomic contacts may differ (Fig. 3, Extended Data Fig. 6, Supplementary 
Table 1). Similarly, there is a highly conserved interaction between a 
hydrophobic residue in the centre of CL2, Leu110 in Az4R and Phe139 
in 8AR, and residues His41°!2, Val217°3! and Asp215°83- in Ga, 
(Extended Data Fig. 7). The main difference between the A24R-mini-G, 
interface compared to the 8.AR-G, interface occurs as a result of the 
different amino acid sequences at the H7-H8 boundary. In A24R, H7 
terminates with Arg291’*° and forms the sequence R’°°IREFR (bold 
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Figure 4 | Conformational changes in A,,R upon G-protein binding. 
AzaR (rainbow colouration) bound to mini-G, (purple) was aligned with 
AzaR in the active-intermediate conformation bound to either NECA 
(PDB ID: 2YDV)* or UK432097 (PDB ID: 3QAK)° to highlight structural 
changes upon G-protein binding. Neither structure was used 

for both comparisons because the large extensions of the ligand UK432097 
compared to NECA distorts the extracellular surface in comparison to 
the NECA-bound structure and the NECA-bound structure contains a 
thermostabilizing mutation in the intracellular half of the receptor. 

a, Alignment between 2YDV and the extracellular half of the 
A»aR-mini-G, complex is viewed parallel to the membrane plane. 

b, Alignment with 3QAK and viewed from the cytoplasmic surface with 
mini-G, removed for clarity. c, Alignment with 3QAK viewed parallel 

to the membrane with the cytoplasmic side at the bottom. Residues are 
labelled with their Ballesteros-Weinstein numbers and arrows depict the 
direction of movement upon mini-G, binding. Conversion of Ballesteros- 
Weinstein and CGN numbers to amino acid residues in A24R and mini- 
G,, respectively, are as follows: R?°°, Arg102; Y°°8, Tyr197; K®°9, Lys227; 
A®93, Ala231 carbonyl; L°?’, Leu235; Y”°%, Tyr288; Y"3, Tyr391; L499, 
Leu393; C-term™>°, C terminus of mini-G, (Leu394). The receptor is in 
rainbow colours and the mini-G, is in purple. 
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amino acid residues make contact with mini-G,), compared to the 
sequence S”°°PDFRI in the equivalent position of 8,AR, where none 
of the residues make contacts with Ga,. Another region of the receptors 
that differs in the presence/absence of contacts to their respective 
G proteins is at the end of H5, owing to the extension of H5 in 8.AR 
by an additional turn compared to Az,R (Fig. 3, Extended Data Fig. 6 
and Supplementary Table 1). From these examples it is clear that 
although the majority of amino acid residues at the interface between 
the receptor and G protein are identical, the specific atoms involved 
in the contacts differ either in terms of the amino acid side chains 
involved, their relative dispositions at the interface and/or the nature 
of the interaction. 

Comparison of the active-intermediate state of UK432097-bound 
AzaR° with the structure of A,,4R bound to mini-G, identified 
major rearrangements in the cytoplasmic half of the receptor core 
to accommodate G-protein binding (Fig. 4) and will be described in 
terms of the rearrangements required to transition from the active- 
intermediate state to the activated G-protein-bound conformation. 
First, the cytoplasmic end of H6 moves away from the receptor core 
by 14A as measured between the Ca atoms of Thr224°® in the 
two different conformations. This movement is achieved through 
H6 bending outwards with little discernible rotation around the 
helix axis. The extent of H6 movement is dictated by van der Waals 
interactions between Lys227°”?, Ala231°?3 and Leu235°?” in AzaR 
and Leu393">° and the carboxy terminus of mini-G,. The movement 
of H6 requires substantial changes in the packing of the cytoplasmic 
end of H6 with helices H5 and H7. In particular, the side chains of 
highly conserved Tyr197°*8 and Tyr288”* both adopt new rotamers 
to fill the space previously occupied by the side chains of Leu235°*” 
(the Ca of which moves by 3.7 A) and Ile238°*° (Ca moves by 2.2 A), 
respectively. The shift in Tyr2887* allows Arg102*”° of the conserved 
DRY motif to adopt a fully extended conformation, packing against 
the side chain of Tyr391!? in the a5 helix of mini-G,. It is notable 
that the structural change from the inactive conformation to the 
active-intermediate state* is characterized by the concerted rotation 
of H5, H6 and H7, whereas the conformation change from the active- 
intermediate state to the active conformation upon mini-G, binding 
is characterized by the bending of H6 with little further rotation. 
In contrast to the considerable rearrangements of the cytoplasmic half 
of the receptor, there are no substantial changes in the extracellular half 
of the receptor (Fig. 4, Extended Data Fig. 9). Thus, the disposition of 
the ligand binding pocket described in the active-intermediate state 
most likely describes the high-affinity state of NECA bound to A2aR. 

AzaR appears to have a very different energy landscape to the 
B-adrenergic receptors (3ARs). Both AzaR and B2AR exist in an 
ensemble of conformations whether bound to antagonists, agonists or 
to no ligand at all, and the presence of agonists increases the probability 
of formation ofa fully active state*””!. This active state is then stabilized 
by binding of a G protein*”. Structures of BARs bound to agonists are 
all in a conformation very similar to the inactive state?**4, whereas 
structures of A>4R bound to agonists are in an active-intermediate 
state+>!° very similar to the active state. Whether there is an active- 
intermediate state for BARs equivalent to A24R is unknown, but 
recently, it has been proposed based on extensive electron paramag- 
netic resonance spectroscopy data that 3,AR also exists in two distinct 
states in the active conformation”®. This work shows that the active- 
intermediate and fully active states are distinct conformations in the 
intracellular half of the receptor. Given the highly conserved nature 
of the mechanism of GPCR activation, it is likely that the active- 
intermediate of A,,4R may represent a common intermediate for many 
class A GPCRs, although it may exist only transiently depending on the 
energy landscape of the receptor. 


Online Content Methods, along with any additional Extended Data display items and 
Source Data, are available in the online version of the paper; references unique to 
these sections appear only in the online paper. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized and the investigators were not blinded to allocation during 
experiments and outcome assessment. 

Expression and purification of mini-G,. The mini-G, construct used (construct 
414) was identical to mini-G, (construct 393) described elsewhere (B.C. and C.G.T., 
manuscript submitted), except that one additional mutation, L63Y, was included 
to improve crystal quality. An N-terminal histidine tag (Hiss) and TEV protease 
cleavage site were present to facilitate purification. Mini-G, was expressed in 
E. coli strain BL21(DE3)RIL upon induction with IPTG (501M) for 20h at 25°C. 
Cells were harvested by centrifugation and lysed by sonication in lysis buffer 
(40mM HEPES pH 7.5, 100mM NaCl, 10% glycerol, 10 mM imidazole, 5mM 
MgCh, 50,1M GDP, 1 mM PMSF, 2.51M Pepstatin-A, 10|.M Leupeptin, 50,.g/ml 
DNase I, 100\1g/ml lysozyme, 100,1M DTT), supplemented with Complete 
protease inhibitors (Roche). The lysate was clarified by centrifugation and loaded 
onto a 10 ml Ni?* Sepharose FF column. The column was washed with wash 
buffer (20mM HEPES pH 7.5, 500mM NaCl, 40 mM imidazole, 10% glycerol, 
1mM MgCh, 501.M GDP) and eluted with elution buffer (20 mM HEPES 
pH 7.5, 100 mM NaCl, 500 mM imidazole, 10% glycerol, 1mM MgCh, 501M 
GDP). TEV protease was added and the sample was dialysed overnight against 
dialysis buffer (20 mM HEPES pH 7.5, 100 mM NaCl, 10% glycerol, 1 mM MgCh, 
10M GDP). TEV protease was removed by negative purification on Ni*'- 
NTA resin (Qiagen). The sample was concentrated to 1.5 ml and loaded onto a 
Superdex-200 26/600 gel-filtration column, equilibrated with gel-filtration buffer 
(10mM HEPES pH 7.5, 100 mM NaCl, 10% glycerol, 1 mM MgCh, 11M GDP, 
0.1mM TCEP). Peak fractions were pooled and concentrated to 100 mg/ml. The 
pure protein was aliquoted, flash-frozen in liquid nitrogen, and stored at —80°C. 
A typical yield was 100 mg of pure mini-G, per litre of culture. 

Expression and purification of adenosine A2,R. Wild-type human adenosine 
AzaR (residues 1-308) was modified to contain a C-terminal histidine tag (His;) 
and TEV protease cleavage site. The N154A mutation was introduced to remove 
a potential N-linked glycosylation site. Recombinant baculoviruses expressing 
AzaR were prepared using the flashBAC ULTRA system (Oxford Expression 
Technologies). Trichoplusia ni cells were grown in suspension in ESF921 media 
(Expression Systems) to a density of 3 x 10° cells/ml, infected with Ay,4R bacu- 
lovirus and incubated for 72h. Cells were harvested and membranes prepared 
by two ultracentrifugation steps in 20mM HEPES pH7.5, 1 mM EDTA, 1mM 
PMSF. NECA (100{1M), NaCl (300 mM), PMSF (1mM) and Complete protease 
inhibitors (Roche) were added to the membranes, and the sample was mixed 
for 30 min at room temperature. Membranes from 3 | of cells were solubilized 
with 2% n-decyl-3-p-maltopyranoside (DM) on ice for 1h. The sample was 
clarified by ultracentrifugation and loaded onto a 5 ml Ni-NTA column (Qiagen). 
The column was washed with wash buffer (20 mM HEPES pH 7.5, 500 mM 
NaCl, 10% glycerol, 80mM imidazole, 100 1M NECA, 0.15% DM), and eluted 
with elution buffer (20 mM HEPES pH 7.5, 100mM NaCl, 10% glycerol, 300 mM 
imidazole, 100|1M NECA, 0.15% DM). The eluate was concentrated using a 50 kDa 
cut-off Amicon centrifugal ultrafiltration unit (Millipore), and exchanged into 
desalting buffer (10 mM HEPES pH 7.5, 100mM NaCl, 10% glycerol, 100 11M 
NECA, 0.15% DM) using a PD10 column (GE Healthcare). TEV protease was 
added, and the sample was incubated on ice overnight. TEV protease was removed 
by negative purification on Ni*t-NTA resin. The sample was concentrated to 0.2 ml 
and loaded onto a Superdex 200 column (GE Healthcare). Peak fractions were 
pooled and concentrated to approximately 20 mg/ml. A typical yield was 2 mg of 
pure AzaR per litre of culture. 

Complexation and crystallization. Purified Az,R was mixed with a 1.2-fold 
molar excess of mini-G,. MgCl, (1 mM) and apyrase (0.1 U) were added, and 
the mixture was incubated on ice overnight. The sample was diluted tenfold in 
gel-filtration buffer (10 mM HEPES pH 7.5, 100 mM NaCl, 100,1M NECA, 0.35% 
n-octyl-3-p-thioglucopyranoside OTG), concentrated to 0.2 ml, and loaded on 
to a Superdex 200 column (pre-equilibrated in the same buffer). Peak fractions, 
containing the A2,,R-mini-G, complex, were pooled and concentrated to 20 mg/ml. 
The Az,R-mini-G, complex was crystallized by vapour diffusion in OTG either 
in the presence or absence of cholesterol hemisuccinate (CHS), but there was no 
discernible difference in the quality of crystals that grew under the two different 
conditions (the structure was determined using data collected from two crystals, 
one from each condition). Crystallization plates were set up at 4°C using 120-nl 
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sitting drops. Crystals used for structure solution were grown in two conditions, 
either: 0.1 M NaOAc pH 5.5, 10% PEG 2000 (in the presence of CHS); or 0.1M 
NaOAc pH 5.7, 9.5% PEG 2000 MME (in the absence of CHS). Crystals were 
cryo-protected in mother liquor supplemented with 30% PEG 400 and flash frozen 
in liquid nitrogen. 

Data collection, processing and refinement. Diffraction data were collected at 
the European Synchrotron Radiation Facility on beamline ID23-2 with a Pilatus 
2M detector, using a 61m x 841m microfocus beam (0.8729 A wavelength). Data 
were collected using either standard or helical collection modes. Data from two 
crystals were used for structure solution. Data were processed using MOSFLM”> 
and AIMLESS”. The structure was solved by molecular replacement with 
PHASER” using the structures of the thermostabilized A24R (PDB ID: 2YDV)* 
and the Ga, GTPase domain (residues 40-59 and 205-394) from the 3,AR-G, 
complex (PDB ID: 3SN6)? as search models. Model refinement and rebuilding 
were performed using REFMAC”* and COOT”. 

Competition binding assay. FreeStyle HEK293-F cells transiently express- 
ing wild type A2,R were resuspended in either assay buffer A (25mM HEPES, 
pH 7.5, 100mM KCl, 1mM MgCl), assay buffer B (25mM HEPES, pH 7.5, 
100mM NaCl, 1mM MgCl), or assay buffer C (25mM HEPES, pH 7.5, 500 mM 
NaCl, 1mM MgCl,), and were lysed by 10 passages through a 26-gauge needle. 
Purified binding partners were buffer-exchanged to the respective buffer before 
being added to the membranes at a final concentration of 25|1.M. The mixture was 
aliquoted and NECA was added (0 to 1 mM final concentration, prepared in assay 
buffers containing 1 U/ml apyrase). The samples were incubated for 90 min at 22°C, 
3H-ZM241385 was added at its apparent Kg (2.5nM) and allowed to bind 
for a further 90 min at 22°C. Non-specific binding was determined in the 
presence of 100|1M of ZM241385. Receptor-bound and free radioligand were 
separated by filtration through 96-well GE/B filter plates (pre-soaked with 0.1% 
polyethyleneimine), and washed three times with the appropriate buffer. Plates 
were dried and radioactivity was quantified by liquid scintillation counting using 
a Tri-Carb 2910 TR (Perkin Elmer). Data were analysed by nonlinear regression 
using GraphPad Prism software. The K; for NECA binding was derived from 
one-site fit Kj analysis. Data from at least three independent experiments, each 
performed in duplicate, were analysed using an unpaired two-tailed t-test for 
statistical significance. 

Thermostability assay. Membranes from Trichoplusia ni cells expressing wild- 
type human A2R were resuspended in T,, buffer (25 mM HEPES pH 7.5, 100mM 
NaCl, 1mM MgCl) and homogenized by ten passages through a 26-gauge needle. 
Binding partner was added at a final concentration of 251M. 7#H-NECA and 
unlabelled NECA were mixed in a molar ratio of 1:5 and added to the membranes 
to give a final concentration of 11M (approximately tenfold above the apparent 
Ka). The samples were incubated at room temperature for 1h, then chilled on ice 
for 30min. DDM, DM or OG were added to a final concentration of 0.1%, 0.13% 
or 0.8%, respectively, and samples were incubated on ice for 1h. Cell debris and 
insoluble material were removed by centrifugation for 5 min at 20,000g and the 
supernatant was aliquoted into PCR strips. Samples were heated to the desired 
temperature for exactly 30 min, then quenched on ice for 30 min. Samples (501) 
were loaded onto gel-filtration resin packed into a 96-well filter plate (Millipore), 
which was centrifuged to separate receptor-bound from free radioligand. Non- 
specific binding was determined in the presence of 200|1M unlabelled NECA. 
Radioactivity was quantified by liquid scintillation counting using a MicroBeta 
TriLux scintillation counter (PerkinElmer). Data were analysed by nonlinear 
regression using GraphPad Prism software. Apparent T,, values were derived from 
sigmoidal dose-response analysis. Results represent the mean + s.e.m. of two 
independent experiments, performed in duplicate. 


25. Leslie, A. G. The integration of macromolecular diffraction data. Acta 
Crystallogr. D 62, 48-57 (2006). 

26. Evans, P. Scaling and assessment of data quality. Acta Crystallogr. D 62, 72-82 
(2006). 

27. McCoy, A. J. et al. Phaser crystallographic software. J. Appl. Crystallogr. 40, 
658-674 (2007). 

28. Murshudov, G. N. et a/. REFMAC5 for the refinement of macromolecular crystal 
structures. Acta Crystallogr. D 67, 355-367 (2011). 

29. Emsley, P.. Lohkamp, B., Scott, W. G. & Cowtan, K. Features and development 
of Coot. Acta Crystallogr. D 66, 486-501 (2010). 
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Crystallogr. D 60, 2288-2294 (2004). 
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100 mM NaCl 4600 + 340 430 + 80 340 + 70 

500 mM NaCl 18200 + 2400 4800 + 230 6600 + 1300 
Extended Data Figure 1 | Pharmacological analyses of A2,R-mini-Gs with heterotrimeric G, with nanobody Nb35 for stabilization of the 
complexes. Competition assays were performed on A2,R expressed complex. Results are summarized in the Table (g). All error bars represent 
in HEK293 cell membranes with the agonist NECA competing for the the s.e.m. for at least three independent experiments performed in 
binding of radiolabelled inverse agonist 7H-ZM241385. Experiments were duplicate. Comparisons of data in b, d and f were performed using an 
performed in the presence of either 100 mM KCI (a, b), 100 mM NaCl unpaired t-test with significance denoted by asterisks: ***P < 0.0001; 
(c, d) or 500 mM NaCl (e, f) to confirm the similar behaviour of mini-G, **P < 0.01; *P < 0.1; not significant (NS) P>0.1. 
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Temperature (°C) Temperature (°C) Temperature (°C) 
d 
Detergent Aza only AgzaR + mini-G, ATm ('C) 
Dodecylmaltoside (DDM) 27.0+0.4°C 34.2 +0.6 °C 72 
Decylmaltoside (DM) 26.1 +0.1 °C 32:9.40:5 °C 6.8 
Octylglucoside (OG) 12.0+0.6 °C 25.440.3 °C 13.4 
Extended Data Figure 2 | Thermostability of detergent-solubilized determined. Data were analysed by nonlinear regression and apparent 
3H-NECA-bound Aj,R in the presence or absence of mini-Gs. Tm Values (transition temperature where 50% of the receptor is inactive) 
Unpurified Az ,R was solubilized in detergent at the following were determined from analysis of the sigmoidal dose-response curves 
concentrations: a, DDM, 0.1%; b, DM, 0.13%; c, OG, 0.8%. Samples were fitted (d). Results represent the mean + s.e.m. of two independent 
heated for 30 min, quenched on ice and the amount of 7H-NECA bound experiments, performed in duplicate. 
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Extended Data Figure 3 | Omit maps for NECA and GDP. 

a-f, Orthogonal views of omit map difference density for NECA in AzaR 
chain A (a, b), NECA in A2R chain B (c, d) and GDP in mini-G, chain C 
(e, f). The contour level is 2.50 in panels a—d and 3.0o in panels e and f. 
Figures were made using CCP4mg””. 
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Extended Data Figure 4 | Electron density for the interface region of is shown contoured at 1.2c. For clarity, transmembrane helices H5 and 
the A2.,R-mini-G, complex. The backbones of A24R and mini-G, are H6 and the corresponding electron density have been omitted. a, View 
shown in cartoon representation in light blue and magenta respectively. showing the interaction between the C-terminal helix of mini-G, and the 
Side chains are shown in stick representation (carbon, light blue; oxygen, CL2 loop of Az,R. b, View showing the interactions between side chains 
red; nitrogen, deep blue). The electron density of the final 2F, — F, map of the C-terminal helix of mini-G, and three Arg residues of Az,R. 
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Extended Data Figure 5 | Alignment of mini-G, with GNAS2. Comparison of amino acid residues in mini-G, (chains C and D) within 3.9 A of 
AzaR (green) in the Ay,R-mini-G, structure and the amino acid residues in bovine GNAS2 (P04896) within 3.9 A of B2AR in the B,AR-G, structure 
(turquoise). The CGN system is used for reference. 
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1.50 
adrb2_human 1 MGQPGNGSAFLLAPNRSHAPDHDVTQQRDEVWVVGMGIVMSLIVLAIVFGNVLVITAIAK 60 
AA2AR_ human 1 MPIMGSSVYITVELAIAVLAILGNVLVCWAVWL 33 
AA2AR_ human 1 MPIMGSSVYITVELAIAVLAILGNVLVCWAVWL 33 
AA2AR_ human 1 MPIMGSSVYITVELAIAVLAILGNVLVCWAVWL 33 
2.50 
adrb2_human 61 FERLQTVTNYFITSLACADLVMGLAVVPFGAAHILMKMWTFGNFWCEFWTSIDVLCVTAS 120 
AA2AR_human 34 NSNLONVTNYFVVSLAAADIAVGVLAIPFAITISTGF--CAACHGCLFIACFVLVLTQSS 91 
AA2AR chain 34 NSNLONVTNYFVVSLAAADIAVGVLAIPFAITISTGF--CAACHGCLFIACFVLVLTQSS 91 
AA2AR chain 34 NSNLONVTINYFVVSLAAADIAVGVLAIPFAITISTGF--CAACHGCLFIACFVLVLTQSS 91 
Kk 
3.50 4.50 
adrb2 human 121 IETLCVIAVDRYFAITSPFKYOSLLTKNKARVIILMVWIVSGLTSFLPIQMHWYRATHQE 180 
AA2AR_human 92 IFSLLAIAIDRYIAIRIPLRYNGLVTGTRAKGIIAICWVLSFAIGLTPMLGWNNCGQPKE 151 
AA2AR chain 92 IFSLLAIAIDRYIM@RI GLVTGTRAKGIIAICWVLSFAIGLTPMLGWNNCGQPKE 151 
AA2AR chain 92 IFSLLAIAIDRYIAIRI LVTGTRAKGIIAICWVLSFAIGLTPMLGWNNCGQPKE 151 
* * * Lk * 
5.50 
adrb2 human 181 AIN------- CYANETCCDFFTNQAYAIASSIVSFYVPLVIMVFVYSRVFQEAKROLOKI 233 
AA2AR human 152 GKNHSQGCGEGQVACLFEDVVPMNYMVYFNFFACVLVPLLLMLGVYLRIFLAARROLKOM 211 
AA2AR chain A 152 KOM 211 
AA2AR chain 152 211 
6.50 
adrb2 human 234 DKSEGRFHVONLSQVEQDGRTGHGLRRSSKFCLKEHKALKTLGIIMGTFTLCWLPFFIVN 293 
AA2AR human 212 ESQPLPGERARS------------------ TLOKEVHAAKSLAIIVGLFALCWLPLHIIN 253 
AA2AR chain A 212 ESQPLPGERARS IIVGLFALCWLPLHIIN 253 
AA2AR chain B 212 ESQPLPGERARS TLOKEVHAAKSEAIIVGLFALCWLPLHIIN 253 
7.50 

adrb2 human 294 IVHVIQDNLI--RKEVYILLNWIGYVNSGFNPLIYCRSPDFRIAFQELLCLRRSSLKAYG 351 
AA2AR human 254 CFTFFCPDCSHAPLWLMYLAIVLSHTNSVVNPFIYAYRIREFROTFRKIIRSHVLRQQEP 313 
AA2AR chain A 254 CFTFFCPDCSHAPLWLMYLAIVLSHTNSVVNPFIYAYR@REFROTFRKIIRSHVLENLYF 313 
AA2AR chain B 254 CFTFFCPDCSHAPLWLMYLAIVLSHTNSVVNPFIYAYRIREFROTFRKIIRSHVLENLYF 313 
adrb2 human 352 NGYSSNGNTGEQSGYHVEQEKENKLLCEDLPGTEDFVGHOGTVPSDNIDSQGRNCSTNDS 411 
AA2AR human 314 FKAAGTSARVLAAHGSDGEQVSLRLNGHPPGVWANGSAPHPERRPNGYALGLVSGGSAQE 373 
AA2AR chain A 314 QGHHHHHHHHHH 325 


AA2AR chain 


adrb2_ human 
AA2AR_ human 


314 


412 
374 


QGHHHHHHHHHH 325 


LL 413 
SQGNTGLPDVELLSHELKGVCPEPPGLDDPLAQDGAGVS 412 


Extended Data Figure 6 | Alignment of 82AR and A,4R amino 

acid sequences. adrb2_human, human (33-adrenergic receptor; 
AA2AR_human, human adenosine Az, receptor; AA2AR chain A, chain 
A of the crystallized A.,R-mini-G, structure; AA2AR chain B, chain B of 
the crystallized Ay,R-mini-G, structure. Residues in the receptors that are 
within 3.9 A of either Ga in the 8;AR-G, complex or mini-G, in the 
A2aR-mini-G, complex are highlighted in turquoise or green, respectively. 


Key Ballesteros—-Weinstein numbers are shown in blue and mutations 
in the crystallized Az,R to facilitate purification and crystallization are 
shown in red. Grey bars indicate the positions of a-helices in the 
8.AR-G, structure, whereas red bars represent these regions in the 
AzaR-mini-G, structure; where there is a discrepancy in helix length 
between chain A and B of AzaR, the bar is coloured pink. 
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Extended Data Figure 7 | A conserved hydrophobic binding pocket at 
the receptor-Ga, interface. The Ay,4R-mini-G, complex was aligned to 
the B,AR-G, complex via the receptors; Az,R, green; 32AR, turquoise; 
mini-G, (purple); Ga, (grey). 
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Extended Data Figure 8 | Comparison between receptor-bound mini-G, and Gag. a-c, Three different views of an alignment of mini-G, (chain C, 
purple) bound to A24R with the GTPase domain of Ga, (grey) bound to 82AR. GDP bound to mini-G, is depicted as a space filling model (carbon, 
yellow; oxygen, red; nitrogen, blue; phosphorus, orange). The «5 helix that interacts with the receptors is labelled. 
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Extended Data Figure 9 | Comparison of the NECA binding site in the 
active-intermediate state compared to the mini-G,-bound state. The 
structure of NECA-bound AzaR (grey cartoon, with the carbon atoms 

of NECA also in grey) in the active-intermediate state was aligned with 
the structure of the NECA-bound A24R-mini-G, complex (rainbow 
colouration, with the carbon atoms of NECA in green). Key amino acid 
residues for both receptors are depicted (sticks; carbon atoms in the same 
colour as the respective receptor) that form hydrogen bonds (red dashed 
line) with either NECA or the associated water network (red spheres). 
Note that the water molecules depicted are from only the NECA-bound 
AoaR structure in the active-intermediate state, because the resolution of 
the A>4R-mini-G, structure was insufficient to identify water molecules. 
Carbonyl oxygens are denoted by ‘co’ after the residue name. 
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Extended Data Table 1 | Data collection and refinement statistics 


Data collection 


Space group 
Cell dimensions a, b, c (A) 
Resolution (A) ' 


Rowarge 0.173 (0.747) 
Vol 3.6 (1.2) 
Completeness (%) 90.6 (78.5) 
Redundancy 2.6 (2.4) 
Refinement 
Resolution (A) 40.3-3.4 
No. reflections 19788 
Rwork/R free (%) 28.4/31.5 
No. atoms 7359 
Protein 7248 
Ligand/detergent/nucleotide 44/40/27 
Water 0 
B-factors (A’) 
Protein 79.9 


Ligand/detergent/nucleotide 
R.M.S.D. 

Bond lengths (A) 

Bond angles (°) 


P24 21 21 


90.6, 111.8, 161.3 
40.3-3.4 (3.49-3.40) 


67.9/98.6/69.0 


0.008 
Li 


lValues in parentheses are for the highest resolution shell. 
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Architecture of fully occupied GluA2 AMPA 
receptor-TARP complex elucidated by cryo-EM 


Yan Zhao!, Shanshuang Chen", Craig Yoshioka’, Isabelle Baconguis! & Eric Gouaux! 


Fast excitatory neurotransmission in the mammalian central 
nervous system is largely carried out by AMPA-sensitive ionotropic 
glutamate receptors'. Localized within the postsynaptic density 
of glutamatergic spines, AMPA receptors are composed of 
heterotetrameric receptor assemblies associated with auxiliary 
subunits, the most common of which are transmembrane AMPA 
receptor regulatory proteins (TARPs). The association of TARPs 
with AMPA receptors modulates receptor trafficking and the 
kinetics of receptor gating and pharmacology’. Here we report the 
cryo-electron microscopy (cryo-EM) structure of the homomeric 
rat GluA2 AMPA receptor saturated with TARP -y2 subunits, which 
shows how the TARPs are arranged with four-fold symmetry around 
the ion channel domain and make extensive interactions with the 
M1, M2 and M4 transmembrane helices. Poised like partially opened 
‘hands’ underneath the two-fold symmetric ligand-binding domain 
(LBD) ‘clamshells’, one pair of TARPs is juxtaposed near the LBD 
dimer interface, whereas the other pair is near the LBD dimer-dimer 
interface. The extracellular ‘domains of TARP are positioned to not 
only modulate LBD clamshell closure, but also affect conformational 
rearrangements of the LBD layer associated with receptor activation 
and desensitization, while the TARP transmembrane domains 
buttress the ion channel pore. 

Stargazin is the founding member of the TARPs’, a family of mem- 
brane proteins related in amino acid sequence to claudin, a four-helix 
transmembrane protein*. Co-expression of recombinant AMPA 
(a-amino-3-hydroxy-5-methyl-4-isoxazole propionic acid) receptors 
with TARPs largely recapitulates native receptor gating kinetics, ion 
channel properties, and pharmacology, consistent with the notion 
that TARPs are fundamental components of neuronal AMPA receptor 
signalling complexes’, yet with a heterogeneous stoichiometry ranging 
from one to four TARPs per receptor®. Stargazin, also known as TARP 42, 
modulates AMPA receptor gating by slowing deactivation and desensi- 
tization, accelerating the recovery from desensitization, increasing the 
efficacy of partial agonists such as kainate, and attenuating polyamine 
block of calcium-permeable AMPA receptors’ *. Despite progress 
in visualization of the AMPA receptor-TARP complex at a low 
resolution!®, determination of the molecular architecture of the AMPA 
receptor-TARP 72 complex and defining a molecular mechanism for 
TARP modulation of receptor function have proven elusive, in part 
because TARPs are bound weakly to the receptor and dissociate under 
typical conditions employed in complex solubilization and purification. 

X-ray crystal and single-particle cryo-EM structures of AMPA 
receptors show that they are tetrameric assemblies consisting of three 
layers: the amino-terminal domain (ATD), the LBD and the transmem- 
brane domain (TMD)!"-. Whereas the ATDs and LBDs assemble as 
two-fold symmetric dimers-of-dimers!®!’, the TMDs adopt four-fold 
symmetry, which results in a symmetry mismatch between the TMD 
and the LBD and gives rise to two-fold related conformationally distinct 
subunit pairs, A-C and B-D!!. Each LBD resembles a ‘clamshell’!® that 


3 


is open in apo and antagonist-bound states and closes upon binding 
of agonists!”. Structures of the GluA2 receptor in agonist-bound pre- 
open states illustrate that the LBDs are assembled in a ‘back-to-back’ 
fashion, with agonist-induced closure of the LBDs causing a separation 
of the LBD-TMD linkers and a translation of the LBD layer closer to 
the membrane!”*. The agonist-bound desensitized state, by contrast, 
undergoes a massive rearrangement of the ATD and LBD layers, thus 
decoupling agonist binding from ion channel gating!*!*”°. 

To define the molecular basis for TARP modulation of AMPA recep- 
tor gating and pharmacology, we sought to elucidate the architecture 
of the AMPA-TARP 2 complex by single-particle cryo-EM. Here 
we focus on the wild-type homomeric rat GluA2 AMPA receptor”!, 
bearing an arginine at the Q/R site” and harbouring the flop splice 
variant”, where we have co-expressed the receptor in mammalian cells 
in combination with full-length TARP 712 (ref. 24, 25). Evidence for 
formation of a physiologically relevant receptor-TARP >2 complex in 
these cells was shown by an increase in the efficacy of the partial ago- 
nist kainate to 80 + 2% of that ofa full agonist, glutamate”® (Fig. 1a). 
To define conditions for solubilization and purification of AMPA recep- 
tor fully bound with TARPs, we carried out fluorescence-detection 
size-exclusion chromatography (FSEC)*’ studies on mammalian cells 
co-expressing GluA2 receptor and an engineered enhanced green 
fluorescent protein (eGFP)-TARP 72 fusion. By systematic screening 
of detergents and lipids through FSEC, we found that whereas dodecyl 
maltopyranoside (DDM) leads to dissociation of the receptor-TARP 72 
complex, digitonin retains the complex integrity, allowing TARP to 
remain associated with receptor following solubilization and puri- 
fication (Extended Data Fig. 1a). We proceeded to purify the native 
GluA2 receptor and full-length TARP complex in the presence of the 
competitive antagonist MPQX”% (Extended Data Fig. 1b, c), succeed- 
ing in isolating a homogeneous population suitable for single-particle 
cryo-EM analysis (Extended Data Fig. 1d, e). 

Three-dimensional (3D) reconstruction of the receptor-TARP 42 
complex without the imposition of symmetry revealed an overall 
architecture consistent with previous crystal and cryo-EM structures 
of the antagonist-bound GluA2 receptor!» (Fig. 1b). The initial 3D 
classification yielded four classes, one of which had four protrusions on 
the extracellular side of the detergent micelle, related by an approximate 
four-fold axis of symmetry, and was composed of the largest number 
of particles. The remaining three classes had poorly resolved features 
associated with the extracellular domains and did not exhibit four-fold 
symmetric protrusions from the micelle, features associated with the 
presence of TARP subunits, and thus were excluded from the analysis 
(Extended Data Fig. 2). Further studies, and larger data sets, will be 
required to elucidate the structures of additional structural classes of 
the receptor-TARP 12 complex. 

To improve the density of the TARPs and the structural features of 
receptor-TARP interactions, we carried out focused refinement of the 
LBD and TMD layers, masking the conformationally heterogeneous 
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Figure 1 | Function and reconstruction of GluA2-TARP +2 complex. 
a, Whole-cell patch clamp recordings from cells expressing the GluA2- 
TARP 72 complex. A representative pair of currents recorded using the 
same cell is shown. The ratio between steady-state currents evoked by 
kainate and glutamate is 0.80 + 0.02 (mean + s.d., n=5). b, Initial 3D 
reconstruction of GluA2-TARP 2 complex contoured at lower (outer) 
and higher (inner) threshold levels showing distinct features for the 
receptor and associated TARP. c, Refined 3D reconstruction focused on 
LBD and TMD layers, where the ATDs were excluded from refinement. 
A-C and B-D subunits of the GluA2 receptor are in green and red, 
respectively; TARP 12 associated with receptor A-C and B-D subunits 
are in blue and gold, respectively. d, Cross-sections of the EM map 

at LBD layer, TARP-LBD interface layer, TMD layer and C-terminal layer 
at indicated ‘height, with density features coloured as in panel c. 


ATD layer, with application of C2 symmetry coincident with the two- 
fold axis that relates the LBD dimers and the four-fold axis of the TMD, 
in the subsequent 3D reconstructions and refinements (Extended Data 
Fig. 2). The resulting density map has an estimated resolution of 7.3 A 
(Extended Data Fig. 3) and illustrates hallmark features of the LBD 
clamshells and the receptor TMD. Most importantly, the density map 
clearly reveals the presence of four TARPs, arranged with four-fold 
symmetry, surrounding the exterior of the receptor TMD, consistent 
with a fully saturated receptor-TARP complex (Fig. 1c). The receptor- 
TARP 72 complex features a similar symmetry mismatch between the 
two-fold related LBD layer and four-fold related TMD layer as found 
in isolated receptor (Fig. 1d), with TARP subunit pairs A'-C' and B'-D' 
‘underneath’ the A-C and B-D LBDs, respectively. We further note that 
density for the full-length receptor M4 and carboxy-terminal TARP 
transmembrane (TM) helices extends into the cytoplasm (Fig. 1d). 
To generate the structural model of GluA2 receptor-TARP 42 
complex, we extracted individual LBDs and the intact TMD from the 
MPQxX-bound GluA2 crystal structure!! and fit them into the cryo-EM 


LETTER 
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Figure 2 | Structure of GluA2 receptor in complex with TARP +2. 
a, b, Cryo-EM density maps for LBD (a) and TM (b) regions of the 
complex. A'-C' TARPs and associated receptor TMs were omitted for 
clarity. c, d, Ribbon diagram (c) and surface representation (d) of the 
complex. The carboxy termini of selected TARP helices (TM4) and 
receptor helices (M4) are labelled. e, Two ‘top-down views, in which 
elements of the structure above the indicated dashed lines have been 
omitted, for clarity, from the figures. The GluA2 receptor is shown in 
transparent surface representation to allow visualization of the TARP 
subunits. 


density as rigid bodies (Fig. 2a, b). Manual adjustments of secondary 
structure elements were applied where there was supporting density, 
followed by fitting of the pre-M1 and M3-S2 linkers (Fig. 2c, d). The 
S2-M4 linkers were not visible in the density maps. The cryo-EM den- 
sity for the ATDs was poorly resolved, in line with their flexibility. Thus, 
we did not focus on optimizing the density of the ATD layer, concen- 
trating instead on the crucial LBD, TMD and TARP regions. The degree 
of LBD clamshell opening is similar to the MPQX-bound full-length 
receptor structure'', confirming that the complex is stabilized in an 
antagonist-bound state (Fig. 2a). 

Because de novo structure determination for TARP was not feasible 
at the resolution of this study, we generated a homology model of the 
TARP ‘2 using the claudin-19 crystal structure as a template* (Extended 
Data Fig. 4a). Rigid body fitting of the TARP model in the density 
map was unambiguous, driven by strong helical density for the TARP 
TMs and consistent with computational analysis of the TARP density 
(Extended Data Fig. 4b-d). The particularly long TM4 of TARP, which 
protrudes into the cytoplasm, provides an additional structural land- 
mark to validate the fitting of the TARP model to the experimental den- 
sity map. Whereas there was strong density to support the presence of a 
TARP (-sheet on the extracellular side of the membrane, like that found 
in claudin-19 (ref. 4) (Extended Data Fig. 5), little density was found 
for the loop connecting the 61 and 82 strand, or the loop between 
TM3 and TM4 (Extended Data Fig. 4a). These loops were therefore 
excluded from the homology model (Extended Data Fig. 4c, d). 
In addition, a short helix (a1) was placed into the tube-like density 
adjacent to TM2, an assignment that was supported by the secondary 
structure search scores and sequence-based secondary structural pre- 
diction (Extended Data Fig. 4b). The final TARP model resembles a 
forearm and partially open right hand, with the TMD representing the 
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Figure 3 | Interactions between GluA2 receptor and TARP +2. a, TM- 
TM interactions between TARP 72 and receptor at A-C positions. TARP 
+2 is shown in ribbon diagram in transparent surface representation, 
whereas helices participating in interactions with receptor are highlighted 
in colour. The central axis of the ion channel pore is indicated by a dashed 
line. A close-up view emphasizes likely hydrophobic interactions between 
TARP and receptor, whereas a top-down view illustrates that all TM helices 
but M3 of the receptor interact with TARP. TM helices from selected 
receptor subunits were omitted for clarity. b, Interactions between TARP 
‘2 ECD and receptor LBD at the A-C positions differ from interactions 

at the B-D positions. TARP 2 ECD and receptor LBD are in closer 
proximity in the B—-D positions (right) than in the A-C positions (left). 
The Ca atoms of the ‘KGK’ motif (697-699) are shown as spheres. Lysine 
and glycine Ca are coloured in blue and grey, respectively. 


arm, the B-sheet representing the palm, and the short «1 helix repre- 
senting the thumb (Extended Data Fig. 4c, d). 

The closer proximity between LBDs and TARPs at the B-D positions 
compared to the A-C positions suggest that the A'-C' and B'-D' TARPs 
play non-equivalent roles in the modulation of receptor activities 
(Fig. 2c, d). Indeed, the lower lobes of the B-D LBDs have been pro- 
posed to play a greater role in ion channel gating”®. Nevertheless, it is 
possible that movements in the LBD layer upon agonist binding could 
cause the LBDs to engage the A'-C' TARPs, a hypothesis supported by 
evidence that binding of four TARPs leads to greater activation by the 
partial agonist kainate”°, highlighting the functional significance of 
TARP subunits in all four positions. 
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Figure 4 | Mechanism for TARP +2 modulation of receptor gating. 

a, TARP 12 subunits resemble partially opened palms and are positioned 
‘underneath’ receptor LBDs in the antagonist-bound state. b, During 
receptor activation, TARP ‘palms’ engage with receptor LBD to stabilize 
intra-dimer and inter-dimer interfaces, modulating receptor activation, 
deactivation and desensitization. An extracellular loop of TARP 72 rich in 
negatively charged residues facilitates the motion of receptor LBD lower 
lobe rich in positively charged residues upon receptor activation, whereas 
TARP 42 TMD directly interacts with receptor TMD including the 
pore-lining M2 helices, leading to modulation of receptor pore properties. 


The interactions between TARP and receptor are comprised of two 
components: TM-TM interactions and TARP extracellular domain 
(ECD)-receptor LBD interactions. The TM-TM interactions at both 
A-C and B-D positions are equivalent, obeying the four-fold symme- 
try of the TMD (Fig. 2e). The TM3 and TM4 of TARP form extensive 
hydrophobic interactions with M1 and M2 from one GluA2 subunit and 
with M4 from the adjacent subunit (Fig. 3a and Extended Data Fig. 6), 
suggesting one structural mechanism by which TARPs can modulate 
the properties of the ion channel. Given the nearly identical receptor 
TMD structure observed in apo and pre-open states, the TARP TMDs 
probably remain bound to receptor in a similar manner in activated 
states. By contrast, there are no direct contacts between the TARP 
thumb and palm and the receptor LBDs (Fig. 3b), although visualiza- 
tion of such interactions may be limited by the resolution of the recon- 
structions as well as inherent TARP flexibility. Nonetheless, a conserved 
acidic region spanning residues 85-95 (sequence: EDADYEADTAE) 
is present in the TARP extracellular ‘loops’ proximal to the a1 helix 
(Extended Data Fig. 4a), poised to interact with several positively 
charged residues on the lower lobe of the LBD, including the “KGK’ 
sequence at residues 697-699 of the receptor”? (Fig. 3b). Whereas these 
elements of structure may be too distant to form salt bridges in this 
antagonist-bound state, we speculate that such interactions could take 
place in pre-open or activated states (Extended Data Fig. 7), consistent 
with the importance of both TARP ECD and the LBD ‘KGK’ motif, as 
well as a ‘lowering’ of the receptor LBD towards the membrane upon 
receptor activation!*"3. 

Elucidation of the architecture of the GluA2 receptor-TARP 42 
complex was facilitated by FSEC-based screening, which shows that 
digitonin stabilizes the receptor-TARP 12 complex. Analysis of the 
structure by single-particle cryo-EM illuminates how four TARPs 
encircle the receptor TMD and participate in extensive interactions 
with receptor TM helices, demonstrating the importance of non-polar 
contacts in complex formation. The acidic, partially open TARP ‘palms’ 
are positioned underneath basic motifs on the lower lobes of the LBDs 
(Fig. 4a), illustrating how complementary electrostatic interactions also 
contribute to receptor-TARP interactions (Fig. 4b). By juxtaposition 
of the TARP palms underneath the LBD clamshells, TARPs are ideally 
positioned to modulate domain closure and thus efficacy of partial 
agonists. Moreover, the A'-C' TARP proximity to the LBD dimer interface 
and the closeness of the B'-D' pair to the LBD dimer-dimer interface 
suggest how TARPs might modulate the activity of positive allosteric 
modulators and the modal gating properties of the receptor, respec- 
tively (Fig. 4a, b). We further speculate that the spatially distinct pairs 
of TARPs offer a structural explanation for biexponential kinetics deac- 
tivation and desensitization of the receptor-TARP y2 complex. Lastly, 
TARP TMD extensively interacts with receptor TMD including the 
pore helix M2, stabilizing the M2 helix and selectivity filter, thereby sug- 
gesting a mechanism for TARP modulation of receptor pore properties. 
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METHODS 


Data reporting. No statistical methods were used to predetermine sample size. 
The experiments were not randomized and the investigators were not blinded to 
allocation during experiments and outcome assessment. 

Electrophysiology. Electrophysiology experiments were performed using a stable 
cell line (clone 10) that constitutively expresses full-length wild-type TARP 42 and 
the C-terminally FLAG-tagged GluA2 AMPA receptor (flop isoform, arginine 
for the Q/R site) under control of the TerON promoter”®. Whole-cell record- 
ings were carried out 10-18 h after induction of GluA2 receptor expression with 
7.54g ml | doxycycline. Pipettes were pulled and polished to 2-3 MQ resistance and 
filled with internal solution containing 75 mM CsCl, 75 mM CsF,5mM EGTA and 
10mM HEPES, pH 7.3. External solution contained 160 mM NaCl, 2.4mM KCl, 
4mM CaCh, 4mM MgCl, 10mM HEPES pH 7.3 and 101M (R,R)-2b, a positive 
modulator that blocks desensitization*!. To allow efficient binding of (R,R)-2b, each 
cell was perfused in external solution for one minute before currents were elicited 
by 3mM of t-glutamate and 0.6 mM kainate, individually, with a one-minute 
wash step in between. Ratios of glutamate and kainate-evoked currents deter- 
mined in five independent experiments were subjected to statistical analysis using 
Origin. 

Expression and purification. The AMPA receptor-TARP 2 complex was 
expressed using clone 10 cells adapted to grow in suspension”», using freestyle 
293 expression medium supplemented with 2% (v/v) fetal bovine serum and selec- 
tion antibiotics (125 1gml~! zeocin, 150,.g ml“! hygromycin, and 125 gml-! 
neomycin). GluA2 (flop variant, arginine at the Q/R site) expression was induced 
by addition of 7.5,1gml~! doxycycline at a cell density of 2 x 10° cells per ml. 
Subsequently, MPQX”* was added to the media (final concentration of 200nM) 
to prevent cytotoxicity due to receptor overexpression. Cells were collected by 
centrifugation around 30-35h after induction and homogenized by sonication. 
After removal of cell debris by centrifugation at 1,200g (15 min at 4°C), the 
supernatant was subjected to ultracentrifugation at 100,000g for 1h to collect the 
membrane fraction. 

The membrane fraction was resuspended and solubilized in TBS buffer 

(20 mM Tris, pH 8.0, 150 mM NaCl) containing 1% (w/v) digitonin and 1 1M 
MPQxX for 2h at 4°C. Insoluble material was removed by ultracentrifugation at 
100,000g for 1h, and the supernatant was passed through an anti-Flag immu- 
noaffinity column pre-equilibrated with buffer P (20mM Tris pH 8.0, 150 mM 
NaCl, 0.1% (w/v) digitonin, 1;1M MPQX), followed by a wash step using 
10 column volumes of buffer P. The Flag-tagged GluA2 receptor in complex with 
TARP was eluted with buffer P supplemented with 0.5 mg ml! Flag peptide. 
The eluted complex was concentrated and further purified by size-exclusion 
chromatography (SEC) using a Superose 6 10/300 GL column equilibrated 
in Buffer P. Peak fractions were pooled and concentrated to 3 mg ml7! using 
100-kDa cut-off concentrator for subsequent biochemical analysis and cryo-EM 
studies. 
Cryo-EM data acquisition. A droplet of 2.51] of purified AMPA receptor-TARP 42 
complex at 3mg ml! was placed on quantifoil 0.6-1.0 Au 200 mesh grids glow 
discharged at 30 mA for 120s. The grid was then blotted for 2-3 s at 22°C under 
conditions of 100% humidity, and flash-frozen in liquid ethane. 

Cryo-EM data were collected on a 300kV Titan Krios microscope using a K2 
camera positioned post a GIF quantum energy filter. The energy filter was set to a 
20 eV slit and a 701m objective aperture was used. Micrographs were recorded in 
super-resolution mode at a magnified physical pixel size of 1.35 A, with the defocus 
ranging from —1.5 to —2.5,1m. Recorded at a dose rate of 8.3 e~ per pixel per s, 
each micrograph consisted of 40 dose-fractionated frames. Each frame was exposed 
for 0.3 s, resulting in a total exposure time of 12s and total dose of 55 e~ per A”. 
Image processing. A total of 2675 micrographs were subjected to motion correc- 
tion with Unblur*”. The CTF parameters for each micrograph were determined 
by CTFFIND3 (ref. 33) and particles were picked using DoG picker**. Several 
rounds of 2D classification were used to remove ice contamination, micelles, dis- 
associated or disordered protein and other false positives. The large number of 
particles discarded was likely a consequence of using DoGPicker with a fairly large 
threshold range; earlier attempts using template-based correlation manifested in an 
orientation bias during 2D classification. In this way, 2D classification also served 
as an opportunity to assess how well the selected particle orientations were distrib- 
uted (Extended Data Fig. 3a). Rounds of 2D classification were repeated until the 
remaining classes had features recognizable from a comparison with an ensemble 
of 2D projections calculated by using the crystal structure of the antagonist-bound 
receptor. From an initial set of 257,378 putative particles, 61,539 particles were 
selected for subsequent 3D classification (Extended Data Fig. 2). 

The subset of particles (61,539) was classified into four classes using a reference 
model generated from the GluA2 X-ray structure of the MPQX-bound state"! 
(PDB code: 3KG2), which had been low-pass filtered to 60 A. The most popu- 
lated 3D class, containing 49% of total particles, featured four 4-fold symmetric 


‘bumps’ on the extracellular side of the detergent micelle, and was subjected to 
further 3D refinement using a soft mask focused on the LBD and TMD domains*>, 
with C2 symmetry imposed**. This further improved the quality of the density 
map, allowing the final map to reach 7.3 A resolution as estimated by Fourier shell 
correlation between two independently refined half maps. Failing to show any 
putative TARP features and having poorly resolved features for the extracellular 
domains, the remaining three 3D classes possibly represented receptor free of 
TARP or even residual false positives, and were therefore excluded from sub- 
sequent analysis. All 2D and 3D classifications and refinements above were 
performed in RELION 1.4 (ref. 37). 

Structural modelling. The structural modelling for the GluA2-TARP 72 
complex was comprised of rigid-body fitting of LBDs and TMDs extracted 
from the MPQX-bound GluA2 crystal structure (PDB code: 3KG2) into 
the cryo-EM density, followed by fitting of a homologous TARP model 
generated by SWISS-MODEL* using the crystal structure of claudin-19 (ref. 4) 
(PDB code: 3X29) and sequence alignment performed with Clustal Omega”. 
A 25.7% sequence similarity (11.3% identity) between claudin-19 and TARP 
+2 was determined by Sequence Manipulation Suite*’. Docked as a rigid body, 
the derived TARP homology model was refined in real space against the density 
map using COOT*! guided by the resolved helical density of the TARP TMs, 
and the density consistent with the conserved }-sheet on the extracellular side 
of the micelle. Furthermore, the TARP TM4 helical density was observed to pro- 
trude from the cytoplasmic side of the micelle, further assisting in the fitting of 
the TARP helices to the density map. The entire model was then improved by 
manual adjustments including removing several loop regions outside of density, 
local rigid-body fitting of individual helices into density, extension of the TARP 
TM4 helix by 14-residues and positioning of a short helix (a1) adjacent to TM2 
supported by secondary structure prediction (Jpred4 (ref. 42)). The extension of 
the TARP TM4 helix was justified by strong density in the experimental density 
maps consistent with continuation of the a-helix, scoring in SSEhunter® 
consistent with an a-helix, and prediction of these residues in an a-helical 
conformation by secondary structure prediction (Jpred4 (ref. 42)). 

To validate the fitting and the placement of TARP TM4 extension and al helix, 
we used SSEhunter* to verify the secondary structure assignment against the EM 
density. To do this, the putative TARP density was extracted in Chimera“ using 
Segger*’, SSEhunter analysis resulted in a series of pseudoatoms located on the 
skeleton of the density map, each assigned with a score. The positive scores at 
TM and al helix region and negative scores at the 3-sheet region confirmed the 
secondary structure elements present in the TARP model. 

The final map was put into a large P1 unit cell (a = b = c = 405 A; 
a= =~y= 90°) and structure factors were calculated in PHENIX“*. The complex 
model of GluA2 receptor (LBD and TMD, residues 394-545, 564-587, 590-776, 
781-826) and TARP (residues 6-38, 56-68, 72-82, (AC, 84-126)-(BD, 91-125), 
131-162, 174--15; see also Extended Data Fig. 4) was then refined against structure 
factors derived from the density map using phenix.real_space_refine*®. Secondary 
structure, two-fold NCS and Ramachandran restraints were applied throughout 
the entire refinement. After refinement, map CC between model and EM map was 
0.716, indicative of a reasonable fit at the present resolution. The resulting model 
was also used to calculate a model-map FSC curve, which agreed well with the 
gold-standard FSCs generated during the RELION refinement (Extended Data 
Fig. 3b). The final model has good stereochemistry, as evaluated using MolProbity 
(Extended Data Table 1). 

All of the figures were prepared with Pymol*’, UCSF Chimera“ and Prism 5. 
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Extended Data Figure 1 | Digitonin is a suitable detergent for the 
purification of GluA2 receptor-TARP +2 complex. a, The receptor- 
TARP complex in digitonin disassociates when diluted into DDM. 

A complex composed of the GluA2 receptor and GFP-tagged TARP 2 
was diluted in digitonin (green) or DDM (red) before being subjected to 
GFP-tuned FSEC analysis. b, Coomassie-blue-stained SDS-PAGE gel of 
the purified complex. c, Tryptophan-tuned FSEC profile of the purified 
complex was comprised of a major peak containing the tetrameric 


complex and a minor shoulder, the latter suggestive of either incompletely 
assembled or partially dissociated complexes. Only the full-size tetrameric 
species was used for single-particle cryo-EM analysis. d, A representative, 
motion-corrected micrograph of the GluA2 receptor-TARP +2 complex 

is shown. A few distinct complexes with the characteristic capital Y shape 
of the non-desensitized state of the AMPA receptor are circled. 

e, Representative 2D class averages showing a range of projections 

of the receptor-TARP +2 complex. 
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Extended Data Figure 2 | The work-flow of cryo-EM data processing. 3D reconstruction without imposed symmetry resulted in a moderate 
The raw data set used in this study was composed of 2,675 micrographs. resolution at 9.6 A. With C2 symmetry imposed, subsequent 3D 
Particles (257,378) were picked from motion-corrected and contrast refinement focused on the LBD and TMD layer improved the density map, 
transfer function (CTF)-estimated micrographs for subsequent allowing a reconstruction at 7.6 A resolution. An additional two iterations 
classifications. After multiple rounds of 2D classification, the of 3D classification and refinement using updated map as the reference 
remaining 61,539 particles were subjected to several rounds of 3D further improved the reconstruction. The resolution of the final cryo-EM 
classification. Initial 3D classification yielded four major classes, where density map was estimated to be 7.3 A. 


the most populated one contained 49% of total particles. An initial 
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Extended Data Figure 3 | Statistics for the cryo-EM reconstruction. 


a, Euler angular distribution of all particles included in the final 3D 
reconstruction. The number of particles viewed from each specific 
orientation was indicated by the size of the corresponding sphere. 
b, Gold-standard Fourier shell correlation (FSC) curves calculated 
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between two independently refined half-maps before (red) and after (blue) 
post-processing, overlaid with FSC curve calculated between cryo-EM 
density map and structural model. c, RESMAP* analysis of the unfiltered 
and unsharpened EM density map indicating the range of local resolution 
by colour code. 
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Extended Data Figure 4 | Structures of the TARP +2 subunits in the 
context of the respective cryo-EM density map. a, Sequence alignment 
between TARP 12 and claudin-19 calculated using Clustal omega. Also 
shown above the alignments are the secondary structure elements of TARP 
7\2 based on the model reported here, and below the aligned sequences 

are the secondary structure elements derived from the claudin-19 

crystal structure. The ECD region rich in negative charges is conserved 
throughout the TARP family and highlighted in red. b, EM density for 

B' TARP and pseudo-atoms placed by SSEhunter, each coloured according 


& —~Thr215 


to a calculated secondary structure score. Positive and negative scores 
indicate a-helix and 3-sheet propensity, respectively. Dashed-line circles 
a map region where high scores were found, suggesting the presence of 
helical structure. A scale bar ranging from a maximum positive value 
(a-helix) to the minimum negative score (3-strand) is shown. c, The 

A' TARP of the A'-C' pair. d, The B' TARP of the B'-D' pair. The first and 
last visible residues Arg6 and Thr215 were labelled. Secondary structure 
elements were colour-coded as in a. 
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Extended Data Figure 5 | Structural comparison between TARP ~j2 and 
claudin-19. A superimposition of the TARP 1/2 structure (in blue) derived 
from this study and claudin-19 (in grey) is consistent in the conserved 
overall fold, with the exception that there is a short «1 helix present only 
in TARP 12. 


© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


LETTER 


Extended Data Figure 6 | Cryo-EM density map for the pore-helix of the _ in extensive interactions with TM4 from TARP subunits and we suggest 
GluA2 receptor. Clear density (blue mesh) is present for the pore-lining that interactions of receptor TM helices that include M2, with TARP 
M2 helices, secondary structure elements that are weak or absent in all TM helices, stabilize the ion channel pore. 

previous crystal structures. The N terminus of each pore helix is involved 
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Extended Data Figure 7 | Possible interaction between receptor pseudo-complex consisting of TARPs and the active state receptor 
LBD and TARP in an active state. Shown on the left is a ‘top-down illustrates a possible mechanism for how TARP interacts with receptor 
view of MPQX-bound receptor-TARP complex structure (in colour) LBD during activation. Enlarged views of the MPQX-bound complex 
superimposed with the crystal structure of an active state GluA2 receptor structure and FW-(R, R)-2b-bound complex model were shown side by 
(in grey) in complex with a partial agonist, fluorowillardiine (FW) side at both D' and A' TARP positions. LBD helices and a S2-M4 linker 


and a positive allosteric modulator, (R, R)-2b, using the central M3 helices _ were labelled according to convention. 
as a reference. ATDs and LBDs were omitted for clarity. The modelled 
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Extended Data Table 1 | Statistics of cryo-EM data collection, 3D reconstruction and model building 


Data collection/processing 


Microscope Krios 
Voltage (kV) 300 
Camera Gatan K2 
Camera mode Super-resolution 
Defocus range (um) -1.5~-2.5 
Exposure time (s) 12 
Dose rate (e’/pixel/s) 8.3 
Magnified Pixel size (A) 1.35 
Reconstruction 
Software RELION 1.4 
Symmetry C2 
Particles refined 26,297 
Resolution (unmasked, A) 9.0 
Resolution (masked, A) 7.3 
Map sharpening B-factor (A?) -600 
Model Statistics 
Protein residues 2309 
Map CC 0.716 
Resolution (FSC=0.5, A) 8.0 
MolProbity score 2.49 
CB deviations (0) 
Ramachandran 

Outliers 0.27% 

Favored 88.28% 
RMS deviations 

Bond length 0.008 

Bond angles 1.26 
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CORRECTIONS & AMENDMENTS 


RETRACTION 
doi:10.1038/nature17648 


Retraction: Effects of electron 
correlations on transport 
properties of iron at Earth’s core 
conditions 

Peng Zhang, R. E. Cohen & K. Haule 


Nature 517, 605-607 (2015); doi:10.1038/nature14090 


In this Letter we reported density functional theory plus dynamical 
mean-field theory (DFT + DMFT) computations of the resistivity from 
electron-electron scattering at the conditions of Earth's core, and found 
that the electron-electron scattering was about the same magnitude as 
the conventional electron-phonon scattering, giving a total resistiv- 
ity that was sufficient to allow a classical thermal-convection-driven 
dynamo. However, L. Pourovskii, J. Mravlje, S. Simak and I. Abrikosov 
could not reproduce our findings, which led us to re-examine our com- 
putations. We found an error of a factor of two that is due to our neglect 
of spin degeneracy (two electrons per band), which would halve the 
electron-electron resistivity and probably make the electron-electron 
scattering insignificant for the geodynamo, at least for pure iron. We 
therefore wish to retract this Letter. 

The smaller electron-electron scattering supports the high conduc- 
tivity of iron that was predicted from electron-phonon density func- 
tional calculations. However, preliminary calculations show that using 
the exact double counting? recently developed for the DFT + DMFT 
method increases the electron-electron scattering. It is also probable 
that the Wiedemann-Franz law, assumed in our previous work, is not 
followed or has a non-constant Lorenz number in liquid metals? or 
correlated systems*. Whether the resulting conductivity is consistent 
with a geodynamo driven by thermal convection requires further 
detailed calculations; the results will be reported elsewhere. The results 
and conclusions in the Letter that refer to resistivity at low tempera- 
tures (in Fig. 2b), and scattering rate and electronic structure (in Fig. 3) 
remain valid. 


1. Sha, X. & Cohen, R. E. First-principles studies of electrical resistivity of iron 
under pressure. J. Phys. Condens. Matter 23, 075401 (2011). 

2. Haule, K. Exact double counting in combining the dynamical mean field theory 
and the density functional theory. Phys. Rev. Lett 115, 196403 (2015). 

3. Yamasue, E., Susa, M., Fukuyama, H. & Nagata, K. Deviation from Wiedemann- 
Franz law for the thermal conductivity of liquid tin and lead at elevated 
temperature. Int. J. Thermophys. 24, 713-730 (2003). 

4. Joura, A. V., Demchenko, D. O. & Freericks, J. K. Thermal transport in the 
Falicov-Kimball model on a Bethe lattice. Phys. Rev. B 69, 165105 (2004). 


112 | NATURE | VOL 536 | 4 AUGUST 2016 
© 2016 Macmillan Publishers Limited, part of Springer Nature. All rights reserved. 


CORRECTIONS & AMENDMENTS 


CORRIGENDUM 
doi:10.1038/nature17984 


Corrigendum: Structure of 
promoter-bound TFIID and model 
of human pre-initiation complex 
assembly 


Robert K. Louder, Yuan He, José Ramon Lopez-Blanco, 
Jie Fang, Pablo Chacon & Eva Nogales 


Nature 531, 604-609 (2016); doi:10.1038/nature1 7394 


In this Article, author Pablo Chacén should have affiliation 4 
(Department of Biological Physical Chemistry, Rocasolano Physical 
Chemistry Institute, CSIC, Serrano 119, Madrid 28006, Spain), not 
affiliation 3. This error has been corrected online. 
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CORRECTIONS & AMENDMENTS 


CORRIGENDUM 
doi:10.1038/nature17673 


Corrigendum: Flexible high- 
temperature dielectric materials 


from polymer nanocomposites 


Qi Li, Lei Chen, Matthew R. Gadinski, Shihai Zhang, 
Guangzu Zhang, Haoyu U. Li, Elissei lagodkine, Aman Haque, 
Long- Qing Chen, Thomas N. Jackson & Qing Wang 


Nature 523, 576-579 (2015); doi:10.1038/nature14647 


The author Elissei lagodkine was erroneously omitted from the author 
list of this Letter. They are associated with the affiliation: Dow Chemical 
Company, 455 Forest Street, Marlborough, Massachusetts 01752, USA, 
and the Author Contributions section should have included the state- 
ment: E.L provided research-grade BCB used in the preparation of the 
samples reported, and also participated in helpful discussions. This has 
been corrected in the online versions of the paper. In addition, authors 
Haoyu Li and Tom Jackson should have been listed as Haoyo U. Li and 
Thomas N. Jackson, respectively; the author list, Acknowledgements 
and Author Contributions sections have therefore been altered 
accordingly. 
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A good conference poster can get you noticed and spur discussions that can expand your network. 


CONFERENCE PRESENTATIONS 


Lead the poster parade 


An eye-catching presentation can attract potential collaborators — and even a cash prize. 


BY CHRIS WOOLSTON 


conference and you'll probably face a 
bewildering number of posters — many 
more than you could ever hope to read in one 
day. So you have to pick and choose. Perhaps 
one reminds you of the worst PowerPoint 
presentation you ever endured; another is 
crammed with thousands of words in micro- 
scopic font. But you encounter one featuring a 
bold illustration, splashes of colour, readable 
text and clean lines. You pause for a closer look, 
chat with the presenter and discover common 
research interests. You've made a connection, 
and at least one poster has accomplished what 
its creator meant it to do. 
The scientific poster remains a crucial 


Se" through a decent-sized scientific 


currency for communication and connection, 
says biophysicist Anthony Salvagno, director of 
education for #SciFund Challenge, a non-profit 
organization in Santa Barbara, California, that 
specializes in science-communication training. 
Through SciFund, he co-teaches a five-week 
online course on poster design along with biolo- 
gist Zen Faulkes of the University of Texas Rio 
Grande Valley in Edinburg. 

Researchers now have access to an array of 
high-end graphics software — and the ‘how to 
make a poster’ conversation has been going on 
for years (see Nature 483, 113-115; 2012). But 
that hasn't stemmed the flow of visual clun- 
kers. As Salvagno explains, researchers often 
slap posters together at the last minute instead 
of thinking about the best ways to deliver their 
message and engage their audience. 


But those who have the vision — and com- 
puter skills — to avoid distracting design 
blunders will draw the right kind of attention 
to themselves, their findings and their ideas. 
They might even win an award (see “Tips for 
making your poster stand out’), although the 
main goals are to publicize their science and 
scientific identity while forging new associa- 
tions. “A good poster will help you make better 
connections,” Salvagno says. “Just one conver- 
sation can turn into a huge success.” 

Trishna Dutta, a wildlife researcher at Colum- 
bia University in New York City who studies 
tigers in India, says that lessons from the poster 
course helped to spark productive conversations 
at the 2015 International Congress for Conser- 
vation Biology (ICCB) in Montpellier, France. 
She had signed up for the course specifically | 
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> to make an impression at the conference. She 
also wanted to make up for past failures. “My 
first posters were bad,” she says. “I didn’t have 
the aesthetic sense of what goes with what.” 
Worse, comments from attendees suggested 
that her key points were often lost, especially 
for those outside her speciality. “That was a case 
where I needed to know my audience,’ she says. 
“People there studied everything from bacteria 
to elephants. I’m not sure they got my message.” 

Her ICCB poster was far clearer. A subhead- 
ing spelled out the take-away message of tiger 
migration, the text was orderly and easy to 
read, maps added colour as well as context, and 
a photo of a wild tiger near the centre captured 
the eye. “T still don't make excellent posters, but 
I'm getting the hang of it’ she says. 

Anxiety about these visual presentations is 
widespread. When Vasco Elbrecht uploaded 
a set of scientific-poster tutorials on YouTube 
(go.nature.com/2akrsly), he realized that he 
had underestimated the demand for such help. 
“I would have been happy if just a few of my 
friends watched them,’ says the PhD student at 
the University of Duisburg-Essen in Germany. 
So far, his poster tutorials have racked up more 
than 31,900 views. 

In his most-viewed video, Elbrecht shows 
examples of good and bad posters from his 
own repertoire. His first — about the genetics 
of Microbotryum fungus — was bogged down 
with huge swathes of text, a common pitfall. “T 
tried to fit everything I could on it, he says. “But 
at a conference, nobody is going to stand there 


and read it for ten minutes.” In a later, more suc- 
cessful poster about the genetic diversity of the 
stonefly Dinocras cephalotes, he limited the text 
to a few hundred words — roughly the same 
count as an abstract (see ‘A winning view ). That's 
generally enough to deliver a key message and 
entice passers-by without overwhelming them, 
he says. The design also helped him to win a 
€1,500 (US$1,660) research prize for his poster 
and abstract from the Institute for the Advance- 
ment of Water Quality and Water Resources 
Management in Essen, Germany, in 2014. 


LESS IS MORE 

Salvagno and Faulke’s poster class stresses 
the same point: when it comes to text, less is 
more. Poster-makers often already know that 
too much text can be off-putting, but many are 
still unable to resist the temptation to include 
practically everything they know about their 
subject. “When I ask people what they dislike 
about posters, too much text is the number- 
one complaint,” Salvagno says. “People hate 
seeing it on other people’s posters, but they do 
it on their own” 

Of course, there’s more to it than getting the 
right word count. Text and graphics have to flow 
together in a way that's as visually appealing as it 
is informative. That takes a designer's eye — or a 
willingness to copy from people who know what 
they are doing. Elbrecht encourages researchers 
to borrow elements from posters that they like. 
“All design is redesign,” he says. “There's no need 
to be original” 


EYES ON THE PRIZE 


Tips for making your poster stand out 


Good posters are supposed to communicate 
results and foster connections — buta 
first-place ribbon wouldn’t hurt, either. A 
poster prize is more than a badge of honour: 
its an accomplishment that would look great 
ona CV. Here are some tips for getting the 
prize. 


@ Scientific modelling. Winning posters 
often go beyond flat text and graphics. 
Where appropriate, consider building a 

3D model of your study subject. “It doesn’t 
add any scientific value, but it gets people’s 
attention,” says Vasco Elbrecht, a PhD 
student at the University of Duisburg-Essen 
in Germany who won a cash prize for his 
poster in 2014. 


@ Tech it up. Technology has opened up 

new possibilities: some conferences allow 
attendees to bolster their posters with videos 
on a tablet or similar device. “If you really 
want to go for a poster prize, have a QR 
smartphone barcode for a video on your 
topic,” Elbrecht says. 


@ Do atest run. Before you ever set foot 

in aconference, you should be confident 
that your poster has all the clarity, appeal 
and impact that you intended. “The 
important thing is to get honest feedback,” 
Elbrecht says. “Show it to people in another 
department if necessary.” 


@ Know your audience. Hedwig van der Meer, 
a PhD student at the Amsterdam University 
of Applied Sciences in the Netherlands, 

was an underdog at the 2016 American 
Academy of Orofacial Pain poster session 

in Florida. “I was a physical therapist from 
the Netherlands going up against all of 

these American doctors,” she says. “I didn’t 
think | could win, especially after seeing the 
other posters.” Her presentation was heavy 
with text but short on colour. Yet it worked 
because the presentation and the topic — 
the connection between temporomandibular 
disorders and headaches — hit the sweet 
spot. “The audience was a match,” she says. 
“| had a clear message, and I’m passionate 
about what | do.” €.W. 
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Effective posters take many shapes, but they 
tend to have some basic elements in common, 
says Sam Hertig, a freelance scientific illustra- 
tor in Berne, Switzerland. Hertig, who has just 
completed a postdoc in computational biology, 
gave a talk on creating a visually striking scien- 
tific poster at Stanford University in California 
earlier this year and uploaded the presentation 
to YouTube (go.nature.com/2aetlrc). As he 
explains, a “stunning” poster generally starts 
with a gripping centrepiece image, whether of 
amolecule, organism or galaxy. One of his own 
recent posters featured a multicoloured image 
of HIV. “Be daring,” he says in the presentation. 
“There may be hundreds or thousands of post- 
ers at a conference. You want something that 
will stand out.” 

Hertig says that the text of a poster should 
have its own visual appeal. In most cases, the 
text will be neatly arranged in 2 to 4columns on 
a poster that’s about 91 cm by 122 cm. The font, 
which should be consistent throughout, must 
be clear and easy to read (not something like 
Comic Sans), and should be at least 24 points. 

The poster should be printed to the maxi- 
mum size allowed by the conference, and the 
title should be large and legible from a distance. 
The subheadings — which should also be clear 
and visible — should say something more 
dynamic than ‘Results: If, for instance, research 
uncovered a 5% decline in the reproductive 
success of heat-stressed frogs, the heading for 
the results section should hint at that finding. 

Hertig says that the placement of white 
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A WINNING VIEW 

Vasco Elbrecht’s award-winning poster about 
the genetic diversity of the stonefly Dinocras 
cephalotes has an eye-catching centrepiece 
image and limits text to a few hundred words 
— enough to deliver a key message and entice 
passers-by. 


ERIKA DEBENEDICTIS/MIT MEDIA LAB 


space is an important but often overlooked 
aspect of poster design. Visually attractive 
posters tend to have substantial borders and 
significant gaps between text blocks. The 
white space should flow together in a cohe- 
sive way that draws in the eye while giving 
it a chance to rest. In a room full of posters 
screaming for attention, he says, some well- 
placed emptiness can offer tranquility. 


THE RIGHT TOOL FOR THE JOB 

Yet these design aesthetics won't amount 
to much without the right software. Many 
researchers resort to PowerPoint, usu- 
ally because they already have PowerPoint 
figures at hand. It can work: Hedwig van 
der Meer, a physiotherapy PhD student 
at the Amsterdam University of Applied 
Sciences in the Netherlands, used Power- 
Point to make her first-place poster at the 
2016 conference of the American Academy 
of Orofacial Pain in Orlando, Florida. But 
Salvagno advises against the program: it isn’t 
designed for printing, the colours may be off 
and the alignment tools are cumbersome. If 
PowerPoint is the only option, he recom- 
mends disabling the ‘snap to grid’ function 
for maximum control of the layout. 

Hertig recommends vector-based graph- 
ics programs such as Inkscape or Adobe 
Illustrator. Unlike PowerPoint and other 
programs that create illustrations with 
pixels, both of these use equations to deter- 
mine each point; images and text can thus 
be scaled up without loss of clarity. These 
programs can also smoothly align text and 
captions. Choose one vector-based pro- 
gram and stick with it for every poster and 
presentation, Hertig adds. “It’s important to 
invest the time early in your PhD. You won't 
have to learn it again. It will just be natural.” 

A quality poster is just one part of a suc- 
cessful presentation. At most conferences, 
the presenter will have at least a couple of 
hours to stand by their posters and inter- 
act with attendees. This is where some of 
the most important work at a conference 
takes place, which is why researchers 
should spend as much time polishing their 
pitches as they spend creating their poster, 
Salvagno says. He recommends preparing 
several different versions of one’s talking 
points: a 20-second elevator pitch for the 
mildly curious and a longer version for any- 
one who wants a deeper dive. 

For her part, van der Meer thinks that her 
presentation of her prizewinning poster was 
as important as the actual product. “You 
have to involve the audience by being open 
and enthusiastic,” she says. “The combina- 
tion of a clear poster and passionate pres- 
entation works best, because people will 
understand your work and get excited.” m 


Chris Woolston is a freelance writer in 
Billings, Montana. 
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TURNING POINT 
Kevin Esvelt 


Evolutionary engineer Kevin Esvelt, at 

the Massachusetts Institute of Technology 

in Cambridge, works with gene drives, 
engineered bits of DNA that can cause a 
mutation to become heritable all the time. He 
calls for researchers to create and use safe lab 
procedures while working with this powerful 
but potentially risky technology. 


What is a gene drive? 

In nature, a gene drive occurs when a DNA 
sequence spreads through a population by 
breaking the conventional rules of inheritance. 
For example, ifan organism has a single copy 
of a fluorescent marker gene and its mate has 
none, normally only half their offspring will 
fluoresce. When a gene-drive system is in play, 
almost all of them will glow. 


How can scientists use this capability? 

Gene drives allow us to drive altered traits 
through wild populations over generations. 
For instance, we could alter the DNA of wild 
mosquitoes to stop them from carrying dis- 
ease. We could restore damaged ecosystems 
and save endangered wildlife by genetically 
removing invasive species. 


How did your insights help to propel this field? 
Even ten years ago, heritable genome edit- 
ing was a possibility, but no one had found a 
molecular tool that would enable it to be done 
efficiently. In 2013, laboratories began using 
CRISPR to precisely edit the genomes of many 
species. I realized then that this tool could be 
used to build stable gene drives in many com- 
plex organisms. It could also be used to build 
reverse drives, which are like molecular erasers 
for overwriting previous edits. 


Why did you explain how gene drives would 
work before you published results showing 
that they could work in any organism? 

Most advances don't give individual scientists 
the power to affect entire ecosystems. By detail- 
ing what was possible, how it could be achieved 
and what safeguards were needed to prevent 
any accidental release of altered organisms 
from the lab, we hoped to set an example of 
how future work in gene drives should proceed. 


Why was this important? 

A single escaped organism that found a mate 
could eventually alter most of the local popula- 
tion and, very possibly, every population of that 
species worldwide. The ecological risk might 
be low, but the damage to public trust in bio- 
technology could imperil the future of the field. 


CAREERS 


Did you want researchers to agree on some 
guidelines first? 

My immediate priority was to prevent the 
accidental release of any gene-drive organisms 
into the wild. I wrote to the few researchers 
working on gene drives to explain my concerns 
about ethics and safety. 


What happened? 

Last year, we released results showing that 
gene drives work in yeast. Then another 
group — who were working with fruit 

flies — independently created a functional 
gene-drive system. They were careful to keep 
the flies contained, but unlike our paper, 
their manuscript, which was meant to be 
published as a how-to for other labs, made 
no mention of safeguards or the risk to wild 
populations. To their credit, they agreed to 
include those details. 


Did your efforts help to usher in regulation? 
The fruit-fly case triggered responses from 
many scientists. For months, we struggled 
to agree on which safeguards should be 
used in the lab. We eventually published our 
recommendations in July 2015, and this year 
the US National Academy of Sciences released 
a report setting out how to conduct gene-drive 
research responsibly. 


Should gene-drive information be classified? 
Classifying such information would hinder 
beneficial applications and threaten biosecu- 
rity. We must know which species to monitor. 
Open science is the best defence and the best 
way to earn public support. = 


INTERVIEW BY VIJEE VENKATRAMAN 


This interview has been edited for length and clarity. 
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Ua SCIENCE FICTION 


FLOATING IN MY TIN CAN 


BY GERRI LEEN 


space, I would take my last breaths 

surrounded by stars and the detri- 
tus of the fleet of the five federations. 
But I’ve wandered off course, my new 
nav system is failing, and now I’m run- 
ning blind with a life-support system 
that is dying. 

I have enemies. They know how to 
sabotage a ship. I check: I always check. 
But the new nav systems readings must 
have masked whatever they did to the 
life support. It waited to fail until I was 
too far from anywhere to call for help. 

No beacons ping on my comm sys- 
tem, not even from the farthest planets 
out. 

Genina, obsessive and damaged, 
was from one of the outer planets. Her 
voice was lovely when she sang. I wish I had 
a recording of her, especially when she sang 
the lullaby her mother had sung to her. 

Her voice made me feel safe, the way I 
hadnt since I was a child. Before men from 
the first federation came and took my parents 
away. Before I was handed over to protective 
services and sent to the children’s home, 
where the refuse of the five federations go. 

Where they taught freedom as a code 
word for conforming. Where we had what 
was expected drilled into us. But at night, 
when the lights were out, Genina and I 
fought back in the room that we shared. 
We discovered things that made us unique 
and incandescent, like the stars I longed to 
fly among. Genina never wanted space; she 
wanted to be free to sing her songs. 

My dreams were of steering a ship any- 
where that would make us free. 

In the children’s home, Genina would sing 
quietly as we shared her bed, skin to skin, 
the covers pulled over us, until one night the 
matrons found out and separated us. 

After that, I'd see Genina in the halls, and 
when she spoke, her voice was different, 
cracked and dusty, as if she'd been denied 
water. 

It was hard to get her alone, but I finally 
did. She wouldn't touch me and she didn't 
want to sing. But I begged and eventually she 
sang the lullaby. I wept — what came out of 
her mouth was no longer song. 

She didn’t cry; she just touched my cheek 
and went back to her room. They found her 
floating in the river a week later. I think it 
was my fault — if I hadn't asked her to sing, 


[== thought that if I died in 
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Lullaby for life. 


she might not have realized what shed lost. 

Ileave the ship on autopilot and check the 
cryo units, normally well hidden, but there’s 
no reason to keep them camouflaged now 
that I’m floating blind. My cargo is asleep, 
deep in frozen dreams. They paid me to get 
them out of the five federations. They are 
singers, you see. 

I rescue singers from those who would 
destroy the songs. 

Or I did. Now... now! drift with them 
safely in cryo and think about ways to make 
this right. 

I suppose I should console myself with 
knowing I did get them free of the five fed- 
erations. I just didn’t get them all the way. 
Would that be a consolation? That they’re 
free for now? 

They sang for me, these two lovers who 
wanted to share a cryo pod until I explained 
how that wouldn't work. The way they 
looked at each other reminded me of how 
Genina used to smile at me. Their voices 
trilled in half-step harmonies that made me 
shiver. So discordant and yet... beautiful. 

These two should have been in their new 
home by now, being lauded by audiences, 
after theyd sung one last song for me — I 
asked all my runners for that. Credits, too, 
of course — a song wouldn't buy fuel — but 
that last song always sounded the sweetest. 
Hope coloured it with something beautiful. 

But nowit'’s hopeless 


SD NATURE.COM and we float and wait 
Follow Futures: for our air to run out. 
 @NatureFutures It will be a while, from 
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time at all as far as the sleepers know. 

Would they want to spend the time 
left to them singing? Or just dreaming 
of song? 

I finger the console. An almost 
musical combination of keystrokes 
will wake them up. 

Or I can let them sleep — the cold 
changing their slumber to a more last- 
ing rest as the power runs out, while 
the inside of the ship begins to resem- 
ble their cryo chamber and I freeze to 
death. 

But... I long to hear their voices 
again. Maybe they know Genina’s lull- 
aby. I didn’t ask when they boarded. 
They're not from her planet, but songs 
travel. 

I push down on the first key. With 
a ping, ‘Initiate awakening?’ blinks on 
the screen. 

I have done this before, five times. Each 
time I have chosen ‘No. 

Thit ‘No’ again. 

I try to remember how Genina’s song 
went. My voice cracks and bends around 
notes that may be right but certainly sound 
nothing like those she made so magical. 

“Tm sorry,’ I say to Genina, as I say every 
night. 

As I say now to these two lovers, whom I 
have failed. 

As I say to myself, as I turn back to the 
command console, take my seat, and try to 
figure a way out of this even though I know 
there isn’t one. 

I think of the cascading trill of my two 
frozen lovebirds. Wouldn't they rather die 
together, knowing their time was ending? 
Singing? 

Or is that just what I want? 

Every time I go back to the cryo chambers, 
Iask myself this, and I let them sleep. 

Genina’s lullaby would sound like a dirge 
in two-step harmony. It would be... fitting. 

No — I will let them sleep. 

Won't I? m 


Gerri Leen lives in Northern Virginia and 
originally hails from Seattle. She has stories 
and poems published by Daily Science 
Fiction, Escape Pod, Grimdark, Enchanted 
Conversation and others. Her first solo 
editing gig, the A Quiet Shelter There 
anthology published by Hadley Rille Books, 
was released in autumn 2015 and benefits 
homeless animals. See more at 
http://www.gerrileen.com. 
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